Journal of Systems Architecture 52 (2006) 1–12 www.elsevier.com/locate/sysarc A core generator for arithmetic cores and testing structures with a network interface q D. Bakalis a,c,*, K.D. Adaos a, D. Lymperopoulos a, M. Bellos a, H.T. Vergos a, G.Ph. Alexiou a,b, D. Nikolos a,b a b Computer Engineering and Informatics Department, University of Patras, 265 00 Patras, Greece Research Academic Computer Technology Institute, 61 Riga Feraiou Str., 262 21 Patras, Greece c Electronics Laboratory, Department of Physics, University of Patras, 265 00 Patras, Greece Received 7 October 2002; received in revised form 30 September 2003; accepted 16 December 2004 Available online 13 June 2005 Abstract We present Eudoxus, a tool for generation of architectural variants for arithmetic soft cores and testing structures targeting a wide variety of functions, operand sizes and architectures. Eudoxus produces structural and synthesizable VHDL and/or Verilog descriptions for: (a) several arithmetic operations including addition, subtraction, multiplication, division, squaring, square rooting and shifting, and (b) several testing structures that can be used as test pattern generators and test response compactors. Interaction with the user is made through a network interface. Since the end user is presented with a variety of unencrypted structural cores, each one describing an architecture with its own area, delay and power characteristics, he can choose the one that best fits his specific needs which he can further optimize or customize. Therefore, designs utilizing these cores are completed in less time and with less effort. 2005 Elsevier B.V. All rights reserved. 1. Introduction q Based on ‘‘EUDOXUS: A WWW-based Generator of Reusable Arithmetic Cores’’, by D. Bakalis, K.D. Adaos, D. Lymperopoulos, G.Ph. Alexiou and D. Nikolos, which appeared in Proceedings of 12th IEEE International Workshop on Rapid System Prototyping, Monterey, CA, USA, pp. 182– 187. 2001 IEEE. * Corresponding author. Tel.: +30 2610 996287; fax: +30 2610 997456. E-mail address: [email protected] (D. Bakalis). Engineers that deal with System-on-a-Chip (SoC) design, face the challenge to integrate a rich set of features in a single piece of silicon, with high performance requirements, on progressively shorter product life cycles in order to achieve todayÕs strict time-to-market goals [1,2]. To cope with these challenges, a block-based approach, that emphasizes design reuse, becomes necessary. 1383-7621/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2004.12.006 2 D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 The use of pre-designed and pre-verified design blocks (cores), available in technology independent descriptions such as in Hardware Description Languages (HDLs) reduces time-to-market through faster design cycles and leads to higher productivity of the design teams. Moreover, it minimizes risk and enables cost-effective solutions. In order to be effective, this block-based design methodology requires access to an extensive library of reusable cores or equivalently to a versatile core generator. The size and complexity of todayÕs complex SOCs also requires the verification of the design before its ASIC implementation. This verification step is commonly accomplished by using a platform of reconfigurable hardware utilizing one or more FPGAs. To exploit the full potential of the rapid prototyping process and minimize the development time and effort, the same HDL description is desirable to be used in both ASIC and FPGA implementations. As the amount of information required to complete a design is increasing, the Internet has emerged as a suitable tool for the delivery of information. Network-based applications and design tools are becoming popular in the last few years [3,4] since they offer several advantages compared with conventional workstation-hosted design techniques: first of all, design tools are now developed for a single platform resulting in reduced cost. Furthermore, upgrades and bug fixes are immediately available to users whereas installation costs are minimized. Moreover, better access control can be achieved. Finally, performance-intensive design tools can run on powerful network servers, yet they can be accessed from low-performance low-cost desktop computers [5]. Several attempts for producing network-based design tools and applications have been reported during the last years [5–14]. Arithmetic modules, such as adders and multipliers, are common and essential building blocks for many applications [15]. Built-In Self-Test (BIST) structures are also candidates for reuse across multiple designs. Algorithms for arithmetic or testing operations coded in a technology independent HDL and synthesizable in a variety of commercial products, offer the designer efficient and versatile building blocks for developing complex System-on-a-Chip solutions. Most available synthesis tools offer some kind of block generators for arithmetic modules. Common limitation is the lack of a variety of architectures for all possible implementation technologies and the fact that the produced modules are encrypted and therefore cannot be customized. Several arithmetic module generators have been presented during the last years. In [12] an on-line arithmetic design tool for FPGAs is presented. The tool was developed in Java programming language and generates standard VHDL code. However, it is restricted to on-line arithmetic, that is, all operations are performed digit serially, most significant bit first. A web-based arithmetic module generator is presented in [13] which is capable of producing multipliers, squarers and adders. In [16] a module generator for generating complex arithmetic structures is presented. Rather than using distinct adder or multiplier modules, this generator combines the supported arithmetic expressions into a single module. The disadvantage of this approach is that it restricts the designer from examining different alternatives to meet his design goals. The authors of [17] present a macro-cell generator for floating-point division and square root operation. A module generator for logic-emulation applications, which is able to generate macro-cells of arbitrarily complex functions described in HDL is proposed in [18]. This module generator accepts Verilog description and produces partitioned Configurable Logic Block (CLB) netlists for decomposing the design into multiple FPGA chips. In [19], a generator for multiplier and squarer structures is proposed. The generator utilizes Wallace tree summation and produces synthesizable VHDL code. However, it is a stand-alone tool limited to only these two types of arithmetic structures. Another standalone generator that implements multiplexer-based structures is presented in [20]. In this work we present Eudoxus,1 a generator of arithmetic cores and testing structures with a 1 The core generator is named Eudoxus after the ancient Greek mathematician and astronomer who substantially advanced number theory. D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 network interface. The output of Eudoxus is synthesizable VHDL and/or Verilog structural descriptions of the arithmetic modules or testing structures. The user only needs to provide the type of the requested operation and the size and form of the operands. Since a completely structural, nonencrypted, description is produced, reusability is assured. Eudoxus handles several arithmetic operations, such as addition, subtraction, multiplication, division, square rooting, squaring and shifting. Moreover, for each such operation Eudoxus can produce more than one distinct architectures. Therefore, the designer can select the one that best meets his specifications or even choose to customize the one closer to his specifications. Therefore, the generator can both alleviate the design effort and help towards the reduction of time to market. Finally, Eudoxus can produce testing structures that are based on Linear Feedback Shift Registers (LFSRs), Cellular Automata (CA) and accumulators. These testing structures can be used for pseudorandom test pattern generation and test response compaction in Built-In Self-Test. The core generator in this work is a direct improvement of the generator presented in [21]. Several new arithmetic cores have been added to the generator as well as the testing structures. The number of different cores supported in the new version of the generator is twice the number of different cores supported in the original version. Furthermore, design examples and comparisons are also included in this work. The rest of the paper is organized as follows: Section 2 presents the core generator. Two examples of circuits whose design was simplified by the proposed generator are given in Section 3. Comparison between the results obtained after synthesis of the descriptions produced by Eudoxus and those produced by commercial generators are given in Section 4. 2. The core generator The characteristics of the core generator, the way it produces the cores, the currently supported cores and its network interface are explained in the following. 3 2.1. Characteristics of the core generator The following decisions were made regarding the implementation of the core generator. In order to ensure technology independence and reusability, Eudoxus produces its descriptions in an abstract level, that is, in structural Verilog and/or VHDL language. The need for supporting both languages is driven by the fact that both of them are considered established standards in the design world. The support of both HDLs helps the designer to easily integrate a module with the rest of a design and map it to a specific technology library through the use of any commercially available logic synthesis tool. Furthermore, since the cores are not encrypted, customization is possible. The cores generated by the tool, are described in a hierarchical way, with small behavioral or structural subfunctions in the leaf cells (e.g. full adders, half adders). The designer may therefore choose to define his own gate-level implementations of the leaf cells even if he has no knowledge of the algorithm used for implementing the arithmetic operation in order to further optimize his design. He has also the ability to insert pipeline registers between the stages of a core or combine more than one cores to implement specialized operations. Eudoxus supports a comprehensive set of different architectures for each of the four basic arithmetic operations (addition, subtraction, multiplication and division) assuming that the operands are applied either in parallel or in serial. This feature provides the designer with a volume of design alternatives for a specific type of arithmetic operation and the ability to choose the one that best meets his area, delay, power and testability constraints. Moreover, Eudoxus is able to produce some other arithmetic modules that are frequently used in system design such as fractional and integer squarers and square rooters and multiplexerbased shifters. It also supports a comprehensive set of different architectures for pseudorandom test pattern generation and test response compaction. Finally, Eudoxus is implemented as a software tool with a network-based user interface, enabling the designer to interact with it without having to install it to a local workstation. This remote 4 D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 operation, additionally, simplifies the maintenance of the tool. 2.2. Core production In this section we shortly describe the way Eudoxus produces the cores. In every architecture supported there is a layout regularity, that is, each architecture can be thought of as a regular interconnection of basic design blocks (cells). The cells may be as simple as a single gate or slightly more complex, as for example a full adder, a Booth Encoder or a carry save adder stage. The regularity of an architecture can be either linear or two-dimensional. For example, a ripple carry adder has a linear regularity whereas a parallel prefix adder or an array multiplier has a two-dimensional one. Once the functionality, the architecture and the size of the requested core has been requested by the user, Eudoxus produces a structural HDL description of the core based on the following steps: Step 1 (core definition): The names of the primary inputs and primary outputs of the core are determined based on its size and functionality. Information about the type of the core as well as its primary inputs and outputs are then placed in the beginning of the core description. Step 2 (basic cell definition): The generator, using an internal look-up table, determines the basic cells that are required to create the core, e.g. full adders, one-bit controlled adders/subtracters, etc. Their HDL descriptions are then produced and placed after the core information part in the core description. Step 3 (core description): The module of the requested core is to be produced next. The size of the core determines the number and type of the basic cells that are needed. At first, the moduleÕs inputs and output ports are declared. Then multiple instances for each of the basic cells are produced. In every instantiated basic cell, each input is connected to either a coreÕs primary input or an output of another basic cell whereas each output is connected to either a coreÕs primary output or an input of another basic cell. The generator, according to the requested coreÕs functionality and architecture, then determines the interconnections between the basic cells and the primary inputs and outputs of the core. Step 4 (test-bench): Once the core has been fully described in HDL, a test-bench for functional verification is also produced which can provide input vectors to the core giving the designer the ability to check its correct functionality by a logic simulation tool. 2.3. Arithmetic cores, testing structures and supported sizes The currently supported cores are presented in Table 1. This table is divided in four major parts. The first one presents arithmetic cores where all operands are applied in parallel (Parallel Arithmetic Cores). The second part presents the cores where one operand is applied in parallel whereas the other operand is applied serially (Serial/Parallel Arithmetic Cores) and the third part presents the cores where all operands are applied serially (Serial Arithmetic Cores). The fourth part presents some structures that are suitable for pseudorandom testing of the arithmetic cores (Testing Structures). These structures include pseudorandom test pattern generators, based on Linear Feedback Shift Registers (LFSRs), Cellular Automata (CA) and accumulators, and test response analyzers based on Multiple Input Linear Feedback Shift Registers (MISRs) and can be combined with circuits under test to produce complete BIST schemes. We have to note that all the above algorithms lead to highly regular structures, that is, the produced cores are designed in the form of regularly repeated patterns of identical circuits. Thus, due to their geometrical regularity, they are suitable for efficient synthesis and implementation in both FPGA and ASIC technologies. The implementation of the above algorithms for arithmetic operations imposes some restrictions on the size of the operands that can be handled. For example, since we use two-bit recoding for Booth multipliers, the operand size must be a multiple of 2. Moreover, in the current version of the tool, we support only modules with equally sized operands (in the case of dividers, the size of one D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 5 Table 1 Arithmetic cores, testing structures and operand sizes Minimum Step size Maximum size Ripple Carry Adder/Subtracter Group Carry Look-Ahead Adder/Subtracter 2 4 1 4 1024 1024 Adders Ripple Carry Adder Group Carry Look-Ahead Adder Multi-Level Carry Look-Ahead Adder [22] Ladner-Fischer Parallel-Prefix Adder [23] Kogge–Stone Parallel-Prefix Adder [24] 2 4 4 2 2 1 4 4 1 1 1024 1024 256 1024 1024 Subtracters Ripple Carry Subtracter Group Carry Look-Ahead Subtracter 2 4 1 4 1024 1024 Multipliers Carry Propagate Array Multiplier Carry Save Array Multiplier [22] TriSection Pezaris Array Multiplier [25] Baugh–Wooley Array Multiplier [26] Modified Booth Multiplier [27] 2 2 2 2 2 1 1 1 1 2 256 256 256 256 256 Dividers Restoring Cellular Array Dividers [22] Non-Restoring Cellular Array Dividers [22] Square rooters Non-Restoring Fractional Square Rooter [22] 4 2 256 Squarers Non-Restoring Fractional Squarer [22] 4 2 256 Shifters Multiplexer-based Shifter [28] 2 1 256 Unsigned Serial/Parallel Multiplier 2Õs complement Serial/Parallel Multiplier [29] 2 2 1 1 256 256 Serial adder Serial Column Adder 2 4 1 2 1024 1024 Parallel Arithmetic Cores Adders/subtracters Serial/Parallel Arithmetic Cores Multipliers Serial Arithmetic Cores Adders 8, 16, 32, 64 8, 16, 32, 64 Multipliers Serial multiplier [30] 4 2 1024 Squarers Serial squarer [30] 4 2 1024 Complementers Serial 1Õs/2Õs complement 2 1 1024 Linear Feedback Shift Registers (LFSRs) [31] Cellular Automata based [32] Accumulator-based without carry feedback [33] Accumulator-based with carry feedback [33] 2 2 2 2 1 1 1 1 256 256 1024 1024 Multiple Input Linear Feedback Shift Register (MISR) 2 1 256 Testing Structures Pseudorandom test pattern generators Response analyzers operand is twice the size of the other). The currently supported sizes are also shown in Table 1. The third and fifth columns of Table 1 indicate the smallest and largest size supported. The fourth column indicates the step between two supported sizes. For example, in the case of parallel multi-level carry look-ahead adders, the tool can produce all cores with operand size a multiple of 4, e.g. 4, 8, 12, . . . , 256. 2.4. Structure of the core generator The interface of the core generator is described in Fig. 1. The arrows define the flow of 6 D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 Fig. 1. The interface of the generator. information between the client where the designer stands and the server where the generator is installed. The designer, using any web-browsing tool, initiates a request to use the generator. If he is authorized to use the tool, a parameter form is passed to him. Information about the supported cores and sizes is available to help him make his decisions. The designer defines the type of the arithmetic or testing operation, the size of the operands and the HDL language of the requested core. Then, the parameters are passed back to the http server, which initiates the core generator as a system process. Eudoxus produces the requested core and stores it in the http server. Then a hyperlink of the produced core is passed to the client, which is able to browse or download the core. At the client, the core can be combined with the rest of a design and mapped to a specific technology with a logic synthesis tool or can be simulated for functional verification using the test-bench module that is produced together with the core. Some snapshots of Eudoxus are given in Fig. 2. The generator of arithmetic cores and testing structures has been implemented using ANSI C whereas the user interface was based on HTML and CGI/Perl scripts. 3. Design examples The applicability of the presented generator is demonstrated in this section in the design of two applications, namely of (a) a floating-point multiplier, and (b) a FIR digital filter. 3.1. Floating-point multiplier We have used the core generator to design a floating-point multiplier that supports the IEEE 754-1985 standard [34]. This multiplier supports both the single precision and the double precision format of the above standard and can perform a number of functions. Specifically, it can perform the following: • it can multiply two single precision operands and produce the result in the same format, • it can multiply two double precision operands and produce the result in the same format and D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 7 Fig. 2. Snapshots of Eudoxus. • it can be fed with two single precision operands, convert them to double precision and provide the result in the latter format. The conversion logic block depicted in Fig. 3 contains the circuitry required to convert the operands from one format to another. At first, the multiplierÕs functionality was described in behavioral HDL and simulated against a software model. Several design alternatives had then to be examined to reach an implementation that balances area and performance requirements. The floating-point multiplier design makes use of three arithmetic cores (see Fig. 3): sign1 +/- e1 exponent s1 sign2 mantissa +/- e2 s2 exponent mantissa Conversion Logic -bias XOR 11-bit adder 53-bit unsigned multiplier incrementer rounding enable incrementer • A 53-bit unsigned multiplier for the mantissas. • Two 11-bit integer adders/subtracters for the exponents. The first one is used to subtract the bias from the first exponent whereas the second adds the second exponent. +/- exponent mantissa Fig. 3. A floating-point, double precision, IEEE standard multiplier. 8 D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 • A 11-bit incrementer used for adjusting the exponent. • A 53-bit incrementer for the rounding procedure. architectures of Table 2 in very short time and with zero design effort. The maximum delay of the floating-point multiplier depends heavily on the delay of the unsigned multiplier for the mantissas. To this end, several implementations of this fixed-point multiplier had to be taken into account. Table 2 presents attained floating point multiplierÕs performance and area results when alternative architectures, provided by our generator, were used for the mantissas multiplication. We have used Group Carry Look-Ahead cores for both the 11-bit adder/subtracters and the two incrementers. Results were obtained using the Synopsys Design Compiler logic synthesis tool along with a 0.25 lm CMOS technology. The availability of the presented core generator enabled us to examine and evaluate all different A second example of the use of Eudoxus, is the design of a Finite Impulse Response (FIR) digital filter with variable coefficients. We have implemented a 16-tap FIR filter with coefficient and data width of eight bits. The filterÕs response is given by 3.2. FIR digital filter yðnÞ ¼ n1 X hðkÞ xðn kÞ ð1Þ k¼0 where the filter coefficients are denoted by h(k). x(k) denotes the input at time k. By proper design, the FIR filter can have linear phase. In this case, the filter coefficients are symmetrical and Eq. (2) holds. hðkÞ ¼ hð15 kÞ; k ¼ 0; . . . ; 15 ð2Þ Eq. (1) in this case can be expressed as follows: 7 X ½xðkÞ þ xð15 kÞ hðkÞ Table 2 Floating point multiplierÕs area and delay results for three different unsigned multiplier architectures yðnÞ ¼ Unsigned multiplier architecture Symmetrical coefficients lead to a reduction of the number of multiplications to one half. Fig. 4 presents a possible structure of the filter described by Eq. (3). This architecture was used in our design example. We have implemented and evaluated three different versions of this FIR filter architecture by Floating point multiplier 54-bit Baugh–Wooley 53-bit Carry Save Array 54-bit Modified Booth D x(n) D Area (sq.microns) Delay (ns) 864,151 690,870 579,522 41.80 30.06 14.00 x(n-1) x(n-14) D D x(n-2) x(n-13) D D x(n-3) x(n-12) D D x(n-4) x(n-11) D D x(n-5) x(n-10) D D x(n-6) x(n-9) D D x(n-7) x(n-8) x(n-15) + h(0) + h(1) + h(2) + + h(3) ð3Þ k¼0 + h(4) h( + h(5)) + h(6) 6 + + + y(n) Fig. 4. 16-Tap linear-phase FIR Filter. h(7) + + + + D D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 9 Table 3 FIR filter implementation results Target: ASIC (library: 0.25 lm CMOS) Modified Booth Baugh–Wooley Pezaris Target: Altera FPGAs (device: EPF10K100EQC208-1) Area (sq. microns) Speed (MHz) Area (logic cells) Speed (MHz) 160,863 191,568 194,032 140 120 125 2589 1848 2116 19.68 21.73 22.72 using three different kinds of multiplication units. By using Eudoxus we created Verilog descriptions for the multipliers. These descriptions were included in custom (hand written) Verilog code describing the other parts of the FIR filter. The total design was implemented in both FPGA and ASIC technology by using AlteraÕs MAX+Plus II and SynopsysÕ Design Compiler, respectively. The three versions of the multiplier unit were Modified Booth, Baugh–Wooley and Pezaris. Each multiplier block is used eight times in the corresponding filter implementation. Table 3 summarizes the implementation results of these three versions. 4. Evaluation of the proposed generator and comparison against commercial generators Most available synthesis tools, whether they target ASIC or FPGA technologies, offer some kind of core generators for arithmetic modules or testing structures. For example, we can refer to Synopsys DesignWare library, Xilinx Core Generator and Altera MegaFunction Wizard. To judge the efficiency of the core descriptions produced by Eudoxus, we compared a number of parallel adder and multiplier cores against the cores produced by Synopsys DesignWare using a 0.25 lm CMOS technology. The results of the synthesis process regarding area and delay are shown in Table 4. In all examined cases, the area and delay of the cores that are generated by Eudoxus are comparable to those of the DesignWare generated cores. Furthermore, in some cases, Eudoxus generated cores which are faster than those generated by DesignWare. We, thus, come to the conclusion that Eudoxus can produce cores that are equally efficient as those generated by commercial generators. This can be attributed to the fact that the cores generated by Eudoxus implement efficient algorithms for arithmetic operations. Similar comparisons of Eudoxus against the Altera or Xilinx core generators however, showed that in some cases Altera or Xilinx cores are better than those of Eudoxus. This is due to the fact that Altera Table 4 Comparison of the proposed generator and the Synopsys DesignWare library Circuit Size Synopsys DesignWare Area (sq.microns) Eudoxus Delay (ns) Area (sq.microns) Delay (ns) Ripple Carry Adder 8 32 1014 4055 1.75 6.88 1584 6336 1.69 6.42 Carry Look-Ahead Adder 8 32 1473 6399 1.20 4.22 1449 4910 1.33 1.56 Parallel-Prefix Addera 8 32 1821 9258 0.68 1.59 2740 16,101 0.82 1.18 Booth Multiplier 8 16 11,025 39,006 2.95 4.16 12,260 50,308 3.02 3.95 a Parallel-prefix adders are based on the Brent–Kung architecture in Synopsys DesignWare and on the Kogge–Stone architecture in Eudoxus. 10 D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 or Xilinx core generators exploit the structure of the LUTs of the FPGA in order to drive the synthesis tool and produce optimized cores (for example they can instruct the synthesis tool to use effectively the fast carry chains) whereas Eudoxus has no knowledge of the underlying technology and therefore the produced cores strongly depend on the efficiency of the FPGA synthesis tool. However, even in these cases, the user can either customize the Eudoxus generated cores or select an alternative core in order to achieve the required characteristics in terms of area, delay and power. This is not feasible with the FPGA commercial generators. The main advantage of Eudoxus compared to generators that accompany synthesis tools is the flexibility and portability that it offers. In commercial generators, the user can only define the parameters of the core that he is interested in and integrate the core with the rest of the design. In several cases, the user does not have access inside the produced core, which is somehow encrypted, and therefore modifications and customization are prohibited. Finally, the generator can only be used along with the corresponding synthesis tool; produced cores cannot be inserted to a different synthesis tool. On the other hand, Eudoxus generated cores are not encrypted. This allows easy customization of the cores. For example, the designer can define his own gate-level implementations in the leaf cells of the core. Furthermore, Eudoxus gives the opportunity to the designer to select, between several different architectures performing a specific operation, the one that best suits his needs. Moreover, because of the structural description, the cores that the proposed generator produces can be used with any logic synthesis tool. This feature provides high portability of the produced cores. Finally, commercial generators are only available to those who have installed the corresponding logic synthesis tool at their workplace. Eudoxus, on the other hand, is always available to the user through a network interface. 5. Conclusion We have presented a tool for generation of cores performing arithmetic operations and testing structures with network accessibility. Once the user determines the type of the arithmetic operation or testing structure that he wishes, along with the operand size and the required architecture, the tool produces the requested core in structural VHDL and/or Verilog. The main benefits of the presented generator are: 1. It supports many (all commonly used) arithmetic operations including addition, subtraction, multiplication, division, squaring, square rooting and shifting. 2. It also supports pseudorandom testing structures that can be used for creating BIST schemes. A core without its corresponding testing scheme is not welcome in several applications. 3. For each operation several architectures are supported. This leaves the designer with the ability to perform an architectural exploration to choose the architecture that best suits his specific needs. For that reason, Eudoxus minimizes the design effort and reduces time-to-market. 4. The structural code that the tool produces is unencrypted, allowing the end designer to perform customization or further optimization. Acknowledgment This research was partially supported by the Public Benefit Foundation ‘‘Alexander S. Onassis’’ via its scholarships program. References [1] M. Keating, P. Bricaud, Reuse Methodology Manual for System-on-a-Chip Designs, Kluwer Academic Publishers, 1998. [2] G. Dare, D. Linzmeier, B. Deitrich, K. Whitelaw, Circuit generation for creating architecture-based virtual components, in: Proc. of Design Automation and Test in Europe Conference (User Forum), 2000, pp. 79–83. [3] D. Alles, G. Vergottini, Taking a look at internet based design in the year 2001, Electronic Design (January 6) (1997) 42–50. [4] M. Spiller, A. Newton, EDA and the Network, in: Proc. of International Conference on Computer-Aided Design, 1997, pp. 470–476. D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 [5] L. Walczowski, D. Nalbantis, W. Waller, K. Shi, Analogue layout generation by world wide web server-based agents, in: Proc. of European Design and Test Conference, 1997, pp. 384–388. [6] T. Vassileva, V. Tchoumatchenko, I. Astinov, I. Furnadjiev, Virtual VHDL Laboratory, in: Proc. of 5th International Conference on Electronics, Circuits and Systems, 1998, pp. 325–328. [7] D. Nalbantis, L. Walczowski, W. Waller, Multiple server WWW-based synthesis of VLSI circuits, in: Proc. of 5th International Conference on Electronics, Circuits and Systems, 1998, pp. 537–540. [8] V. Tchoumatchenko, T. Vassileva, I. Vassilev, I. Furnadjiev, WWW based distributed FPGA design, in: Proc. of Design Automation and Test in Europe Conference (User Forum), 2000, pp. 97–101. [9] H. Lavana, A. Khetawat, F. Brglez, K. Kozminski, Executable workflows: a Paradigm for collaborative design on the internet, in: Proc. of Design Automation Conference, 1997, pp. 553–558. [10] J. Pardo, M. Iriso, T. Riesgo, E. de la Torre, Y. Torroja, J. Uceda, AVI: a tool to learn VHDL through internet, in: Proc. of XV Design of Circuits and Integrated Systems Conference, 2000, pp. 224–229. [11] M. Hayes, M. Jamrozik, Internet distance learning: the problems, the pitfalls and the future, Journal of VLSI Signal Processing 29 (1–2) (2001) 63–69. [12] A. Schneider, R. McIlhenny, M. Ercegovac, BigSky—an on-line arithmetic design tool for FPGAs, in: Proc. of IEEE Symposium on Field-Programmable Custom Computing Machines, 2000, pp. 303–304. [13] J. Pihl, J. Oeye, A web-based arithmetic module generator for high performance VLSI applications, in: Proc. of International Workshop on IP-Based Synthesis and System Design, 1998. [14] Xtensa Configurable Processor, http://www.tensilica.com. [15] M. Jacome, H. Peixoto, A survey of digital design reuse, IEEE Design and Test of Computers 18 (3) (2001) 98–107. [16] D. Kumar, B. Erickson, ASOP: Arithmetic sum-of-products generator, in: Proc. of International Conference on Computer Design, 1994, pp. 522–526. [17] M. Aberbour, A. Houelle, H. Mahrez, N. Vaucher, A. Guyot, On portable macrocell FPU generators for division and square root operators complying to the full IEEE-754 Standard, IEEE Transactions on VLSI systems 6 (1) (1998) 114–121. [18] W. Fang, A. Wu, D. Chen, EmGen—a module generator for logic emulation applications, IEEE Transactions on VLSI Systems 7 (4) (1999) 488–492. [19] J. Pihl, E. Aas, A multiplier and squarer generator for high performance DSP applications, in: Proc. of 39th Midwest Symposium on Circuits and Systems, vol. 1, 1996, pp. 109– 112. [20] J. Abke, E. Barke, J. Stohmann, A universal module generator for LUT-based FPGAs, in: Proc. of International Workshop on Rapid System Prototyping, 1999, pp. 230–235. 11 [21] D. Bakalis, K.D. Adaos, D. Lymperopoulos, G. Ph. Alexiou, D. Nikolos, EUDOXUS: a WWW-based generator of reusable arithmetic cores, in: Proc. of International Workshop on Rapid System Prototyping, 2001, pp. 182– 187. [22] K. Hwang, Computer Arithmetic Principles, Architecture and Design, John Wiley & Sons, 1979. [23] R.E. Ladner, M.J. Fischer, Parallel prefix computation, Journal of the ACM 27 (4) (1980) 831–838. [24] P. Kogge, H. Stone, A parallel algorithm for the efficient solution of a general class of recurrence equations, IEEE Transactions on Computers 22 (8) (1973) 783–791. [25] D. Pezaris, A 40 ns 17-bit by 17-bit array multiplier, IEEE Transactions on Computers C-20 (4) (1971) 442–447. [26] R. Baugh, A. Wooley, A twoÕs complement parallel array multiplication algorithm, IEEE Transactions on Computers C-22 (1–2) (1973) 1045–1047. [27] M. Annaratone, Digital CMOS Circuit Design, Kluwer Academic Publishers, 1986. [28] R.O. Duarte, M. Nicolaidis, H. Bederr, Y. Zorian, Efficient totally self-checking shifter design, Journal of Electronic Testing: Theory and Applications 12 (1–2) (1998) 29– 39. [29] G. Alexiou, N. Kanopoulos, A new serial/parallel twoÕs complement multiplier for VLSI digital signal processing, International Journal of Circuit Theory & Applications 20 (1992) 209–214. [30] P. Ienne, M.A. Viredaz, Bit serial multipliers and squarers, IEEE Transactions on Computers 43 (12) (1994) 1445– 1450. [31] P.H. Bardell, W.H. McAnney, J. Savir, Built-in Test for VLSI: Pseudo-random Techniques, John Wiley & Sons, New York, 1987. [32] K. Cattell, S. Zhang, Minimal cost one-dimensional linear hybrid cellular automata of degree through 500, Journal of Electronic Testing: Theory and Applications 6 (2) (1995) 255–258. [33] A.P. Stroele, BIST pattern generators using addition and subtraction operations, Journal of Electronic Testing: Theory and Applications 11 (1) (1997) 69–80. [34] ANSI/IEEE, IEEE standard for binary floating-point arithmetic, ANSI/IEEE Trans. Std. 754-1985. Dimitris N. Bakalis received the Diploma degree in 1995, the M.Sc. degree in 2000 and the Ph.D. degree in 2001 in Computer Engineering, all from the Department of Computer Engineering and Informatics at the University of Patras in Greece. He currently holds a lecturer position in the Physics Department at the same university. His main research interests include Rapid Prototyping, VLSI and System Design and Test, Low Power Design, Test and Design for Testability. He is a member of the IEEE. 12 D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12 Kostas D. Adaos received a Diploma degree from the Department of Computer Engineering and Informatics at the University of Patras in Greece in 1993, where he is pursuing his Ph.D. degree. He has gained experience in various positions in industry where he has worked as an IC and system designer. He is currently working as an IC designer for Atmel Corporation. His main research interests include Rapid Prototyping, VLSI and System Design. He is a member of the IEEE. Dimitrios Lymperopoulos received his Diploma in Computer Engineering in 2003 from the Department of Computer Engineering and Informatics, University of Patras, Greece. In 2003 he joined the Embedded Networks and Applications Lab (ENALAB) of the Electrical Engineering Department at Yale University. He is currently pursuing a Ph.D. degree in the area of embedded systems and wireless sensor networks. Maciej Bellos received the Diploma in Computer Engineering in 1999 and the M.Sc. degree in Computer Science and Technology in 2001 from the Department of Computer Engineering and Informatics, University of Patras, Greece. He is currently pursuing the Ph.D. degree. His research is focused on efficient IC testing and design for testability. He is a member of the Technical Chamber of Greece. H.T. Vergos received the Diploma in Computer Engineering, and his Ph.D. in Fault Tolerant Computer Architectures in 1991 and 1996, respectively, both from the Department of Computer Engineering & Informatics, University of Patras, Greece, where he currently holds an Assistant Professor. In 1998, he worked for Atmel Multimedia & Communications Group, developing the first IEEE 802.11 compliant wireless MAC device. He is the author of more than 30 scientific publications and holds one world patent. His interests include Computer Arithmetic, Rapid Prototyping, Dependable System Architectures and Low Power design and Low Power Test. George Ph. Alexiou received a B.Sc. in Physics in 1976 and a Ph.D. in Electronics in 1980, both from the University of Patras, Greece, where he is now a professor in the Department of Computer Engineering and Informatics and Director of the Microelectronics (VLSI) Lab. He has served in all programme committees of IEEE International Symposium of Quality Electronics Design (ISQED) starting 2000. He also has served in the programme committee of IEEE Rapid System Prototyping Workshop starting 2000. He is publishing papers in a number of international journals and conference proceedings. His research interests include VLSI design, VLSI CAD tools, signal processing, digital systems and RF data communications. He is a member of the IEEE. Dimitris Nikolos received the B.Sc. degree in physics, the M.Sc. degree in Electronics and the Ph.D. degree in Computer Science, from the University of Athens, Athens, Greece. He is currently a Full Professor in the Computer Engineering and Informatics Department of the University of Patras, Patras, Greece, and head of the Technology and Computer Architecture Laboratory. He has authored or coauthored more than 140 scientific papers in refereed international journals and conferences and holds one USA patent. His main research interests are fault-tolerant computing, computer architecture, VLSI design, low power design, test and design for testability. Prof. Nikolos was co-recipient of the Best Paper Award for his work ‘‘Extending the Viability of IDDQ Testing in the Deep Submicron Era’’ presented at the 3rd IEEE Int. Symposium on Quality Electronic Design (ISQED 2002). He has served as the Program Co-chairman of five IEEE Int. OnLine Testing Workshops (1997–2001). He also served on the program committees for the IEEE Int. On-Line Testing Symposiums (2003–2005), the IEEE International Symposium on Defect and Fault Tolerance in VLSI systems (1997–1999), for the Third and Fourth European Dependable Computing Conference and the Design Automation & Test (DATE) Conferences (2000–2005). He was a Guest Co-editor for the June 2002 special issue of the Journal of Electronic Testing, Theory and Applications (JETTA), which was devoted to the 2001 IEEE International On-Line Testing Workshop. He is a member of the IEEE.
© Copyright 2025 Paperzz