Final Report

 ECE 499/ELEC 543 DESIGN PROJECT SYNTHESIS OF HCORDIC PROCESSOR ALI NIA MOUHANNAD OWEIS TUBEGO PHAMPHANG DAISUKE YOSHIDA 7/29/2011 Page of 31 Table of Contents Abstract .................................................................................................................................................. iiippendix A ............................................................................................................................................ 25 Appendix B ............................................................................................................................................ 25 LIST OF TABLES Table 1: Interface signals for the overall design ..................................................................................... 5 Table 2: Interface signals for the X_ALU ............................................................................................... 10 Table 3: Interface signals for the Y ALU ................................................................................................ 11 Table 4: Interface signals for the K ALU component ............................................................................ 12 Table 5: Interface signals for the Z ALU ................................................................................................ 13 Table 6: Interface signals for the multiplier .......................................................................................... 15 Table 7: Interface signals for the Exponents component ..................................................................... 15 Table 8: Interface signals for the multiplier (mult18x18s) .................................................................... 16 Table 9: Interface signals for the exponent adder ................................................................................ 17 Table 10: Interface signal for the exponent normalizer ....................................................................... 18 Table 11: Interface signal for the floating point adder ......................................................................... 21 i Table 12: Linear vectoring ..................................................................................................................... 22 Table 13: Hyperbolic rotation ............................................................................................................... 22 Table 14: Linear vectoring simulation results ....................................................................................... 23 Table 15: Circular vectoring simulation results..................................................................................... 23 Table 16: Simulated results for circular rotation .................................................................................. 24 LIST OF FIGURES Figure 1: Diagram showing Cordic operations and corresponding equations ........................................ 1 Figure 2: Flow chart of HCORDIC algorithm ............................................................................................ 3 Figure 3: HCORDIC overall structural design........................................................................................... 4 Figure 4: Feedback multiplexer component ........................................................................................... 5 Figure 5: State diagram for the Finite State Machine ............................................................................. 6 Figure 6: Rotation with lookup table ...................................................................................................... 7 Figure 7: Linear vectoring ....................................................................................................................... 8 Figure 8: Vectoring with lookup table..................................................................................................... 9 Figure 9: Structural diagram for X ALU ................................................................................................. 10 Figure 10: Structural diagram for the Y ALU ......................................................................................... 11 Figure 11: Structural diagram for the K ALU ......................................................................................... 12 Figure 12: Structural diagram for the Z ALU ......................................................................................... 13 Figure 13: Structural diagram for the multiplier ................................................................................... 14 Figure 14: Structural diagram for the Exponents component .............................................................. 15 Figure 15: Component diagram of a fixed point 18 x 18 bit multiplier ................................................ 16 Figure 16: Component diagram for the exponent adder...................................................................... 17 Figure 17: Component diagram for normalizing the exponent ............................................................ 17 Figure 18: Component diagram for the floating point adder ............................................................... 18 Figure 19: State chart for performing the floating point addition/subtraction .................................... 20 Figure 20: Simulation results for hyperbolic rotation by one state ...................................................... 21 ii Abstract The CORDIC algorithm was very powerful in computing elementary functions more than 50 years; but with the use of multipliers found in modern processors, the CORDIC algorithm can be greatly enhanced. In this project, floating point HCORDIC was designed in VHDL and synthesized on an FPGA board to prove its powerfulness and efficiency over CORDIC. After implementation, various testing methods such as module testing and integration testing were applied to ensure the quality of the HCORDIC model derived from the HCORDIC theory proposed by Dr. F. Gebali. From the simulation it was proved that HCORDIC required less iterations to perform different computations than that of CORDIC which requires 24 iterations for each computation. Thus the performance was increased by at least 55%.
iii 1.0 INTRODUCTION Coordinate Rotation Digital Computer (CORDIC) was originally developed by Jack E. Volder [1] as a digital solution to the real time navigation problem. The algorithm employs only shift and add/subtract operation which makes it very attractive for Hardware implementation. Cordic belongs to digit by digit iterative numerical algorithm which has the property of generating one true binary digit of the result for each iteration. Welther [2] later summarised the algorithm using set of unified CORDIC iteration equations as shown below, ‫ݔ‬௜ାଵ ൌ ‫ݔ‬௜ ൅ ߤ݉‫ݕ‬௜ ߜ௜ ‫ݕ‬௜ାଵ ൌ ‫ݕ‬௜ െ ߤ‫ݔ‬௜ ߜ௜ ‫ݖ‬௜ାଵ ൌ ‫ݖ‬௜ ൅ ߠ௜ [1] [2] [3] Using the unified CORDIC iteration equations, we can then compute a wide range of mathematical functions. Figure 1, adapted from Welther [2], shows the outputs of the algorithm Xn, Yn and Zn for ƚŚĞƚŚƌĞĞŐƌŽƵƉŽĨŽƉĞƌĂƚŝŽŶƐ͞ŝƌĐƵůĂƌ͕͟͞>ŝŶĞĂƌ͟ĂŶĚ
͞,LJƉĞƌďŽůŝĐ͘͟ Figure 1: Diagram showing Cordic operations and corresponding equations 1 1.1 PROBLEM DESCRIPTION There are a few limitations and drawbacks associated with the Cordic algorithm such as fixed number of iterations for each computation performed, the outputs are scaled by scale factor, limitation in operating range and the algorithm cannot take advantages in modern microprocessor technology like cheap multipliers and memory. This report proposes a modification associated with original Cordic equations and introduces a new approach and its proven hardware implementation using VHDL hardware description language. 1.2 SOLUTION High Performance Adaptive Cordic (HCORDIC) presented by Dr. F. Gebali [3] proposes a modification to the original iterative Cordic. HCORDIC requires less iteration and can double the number of elementary functions calculated by Cordic. In HCORDIC at iteration i, the value of x, y and z are updated as well as the current scale factor according to the following equation ‫ݔ‬௜ାଵ ൌ ‫ݔ‬௜ ൅ ݉‫ݕ‬௜ ߜ௜ ‫ݕ‬௜ାଵ ൌ ‫ݕ‬௜ െ ‫ݔ‬௜ ߜ௜ ‫ݖ‬௜ାଵ ൌ ‫ݖ‬௜ ൅ ߠ௜ ‫ܭ‬௜ାଵ ൌ ‫ܭ‬௜ ൈ ݇௜ [4] [5] [6] [7] Just as the regular CORDIC, HCORDIC has two operatŝŽŶƐ͞sĞĐƚŽƌŝŶŐ͟ĂŶĚ͞ZŽƚĂƚŝŽŶ͟Ͷeach ŽĨǁŚŝĐŚŚĂƐƚŚƌĞĞŵŽĚĞƐ͕͞ŝƌĐƵůĂƌ͕͟͞>ŝŶĞĂƌ͕͟ĂŶĚ͞,LJƉĞƌďŽůŝĐ͘͟dŚĞĞƋƵĂƚŝŽŶƐĨŽƌ
computing the values for the lookup tables have the following notations. n = number of bits in the mantissa. s = number of leading bits in the mantissa to be scanned in parallel. Xe = exponent part of X. Xf = mantissa part of X. In the Vectoring operation, just as in the CORDIC algorithm, our goal is to bring y to zero ;LJїϬͿ͘dŚĞƐƚĞƉƐŝnjĞ;ɷi) is chosen so that ɷсLJi / xi and to do this in one step, the size of the look-­‐ƵƉƚĂďůĞǁŝůůďĞǀĞƌLJůĂƌŐĞ͘dŚƵƐƚŚĞŝƚĞƌĂƚŝǀĞǀĂůƵĞŽĨɷi is chosen such that, ͳȁܻ௜ ȁ ൒ ȁܺ௜ ȁǡ ݉ ് Ͳ
ߜ௜ ൌ ቐሺܻ௜ Τܺ௜ ሻ௦ ൈ ʹ௒೐ ି௑೐ ȁܻ௜ ȁ ൏ ȁܺ௜ ȁǡ ݉ ് Ͳ ሺܻ௜ Τܺ௜ ሻ௦ ൈ ʹ௒೐ ି௑೐ ݉ ൌ Ͳ
‫ି݊ܽݐ‬ଵ ߜ௜ ݉ ൌ ͳܿ݅‫݁݀݋݉ݎ݈ܽݑܿݎ‬
ߠ௜ ൌ ቐ ߜ௜ ݉ ൌ Ͳ݈݅݊݁ܽ‫ ݁݀݋݉ݎ‬ ‫ି݄݊ܽݐ‬ଵ ߜ௜ ݉ ൌ െͳ݄‫݁݀݋݈݉ܿ݅݋ܾݎ݁݌ݕ‬
In the Rotation operation, like in the CORDIC algorithm, our objective is to bring z to zero ;njїϬͿĂŶĚƚŚĞƐƚĞƉƐŝze ɽi is chosen so that ɽi= zi. To do this in a single step, the size of the 2 ůŽŽŬƵƉƚĂďůĞǁŝůůďĞǀĞƌLJůĂƌŐĞ͘,ĞŶĐĞǁĞŚĂǀĞƚŽĐŚŽŽƐĞƚŚĞŝƚĞƌĂƚŝǀĞǀĂůƵĞŽĨɽ i in such a way that, ͳܼ௘ ൐ Ͳ
ߠ௜ ൌ ቐെܼ௦ ʹ௓೐ െ݊Τʹ ൑ ܼ௘ ൑ Ͳ െܼ௜ ܼ௘ ൏ െ݊Τʹ
ߜ௜ ൌ ൜
‫ߠ݊ܽݐ‬௜ ݉ ൌ ͳ
–ƒŠ ߠ௜ ݉ ൌ െͳ
Up until now, both HCORDIC and CORDIC were implemented with the C programming language on a regular CPU processor to demonstrate the advantages of HCORDIC over CORDIC. Figure 2 shows the algorithm of HCORDIC. Figure 2: Flow chart of HCORDIC algorithm 2.0 DESIGN SOLUTION HCORDIC can be implemented in different ways based on the needs and the purpose to which it is intended to be used for. It can be implemented in software for speeding up computations involving digital signal processing or it can be implemented in a high level language such as C and then mapped to a soft-­‐core processor to perform computations involving large data, for example implementing it as a co-­‐processor in super computers. In order to design the HCORDIC, the algorithm needs to be fully understood. Thus the classical CORDIC was first implemented in C and then the HCOZ/͛ƐƌŽƚĂƚŝŽŶŽƉĞƌĂƚŝŽŶǁĂƐ
also implemented in C. In addition, these implementations were fully tested to ensure that the way the algorithm was understood was correct. The project was then modelled by coming up with the overall design which was based on the top level components realizing the HCORDIC. With the overall picture of the project, the components were further divided into smaller subcomponents. ŽƚŚƚŚĞƚŽƉůĞǀĞůĐŽŵƉŽŶĞŶƚƐĂŶĚƐƵďĐŽŵƉŽŶĞŶƚƐ͛ architectures are all structural and the low level components instantiated have behavioural architectures. The low level components have behavioural architectures so that their behaviours can be confirmed 3 before they are interconnected to form the upper level components. This design methodology ensures that if there is bug, the bug can be traced to the lower components since the ISE design suite can be used to debug the structural design such as a case where the signal is driving multiple buses. The top level and subcomponents will be described in detail in the following sectionsͶ
explaining why they are designed the way they are, and their structural diagrams. 2.1 OVERALL DESIGN Figure 3 shows the overall design of our HCORDIC implementation. It consists of a multiplexer, a finite state machine, and four ALUs running in parallel. The design has eight inputs and three outputs as shown in the table 1 below, Figure 3: HCORDIC overall structural design 4 Table 1: Interface signals for the overall design SIGNAL X_in Y_in Z_in clock op MODE IN IN IN IN IN TYPE STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC STD_LOGIC mode IN STD_LOGIC_VECTOR Nex Load Xf Yf zf IN IN OUT OUT OUT STD_LOGIC STD_LOGIC STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC_VECTOR COMMENT 32 bit floating point number 32 bit floating point number 32 bit floating point number System clock Used for testing the component only 2 bit number represent mode(circular, hyperbolic, linear) Triggers next transition Loads X_in, Y_in, and Z_in 32 bit floating point number 32 bit floating point number 32 bit floating point number 2.2 MULTIPLEXER dŚŝƐĐŽŵƉŽŶĞŶƚŐŝǀĞƐƚŚĞƵƐĞƌƚŚĞĐŚĂŶĐĞƚŽůŽĂĚƚŚĞŝŶŝƚŝĂůyͺŝŶ͕zͺŝŶ͕ĂŶĚͺŝŶƐŝŐŶĂůƐ͘/ƚ
ǁĂƐĂůƐŽƵƐĞĚƚŽƐLJŶĐŚƌŽŶŝnjĞďĞƚǁĞĞŶƚŚĞĚŝĨĨĞƌĞŶƚƐƚĂŐĞƐŝŶƚŚĞĚĞƐŝŐŶ͘dŚĞĨŝŐƵƌĞďĞůŽǁ
ƐŚŽǁƐƚŚŝƐĐŽŵƉŽŶĞŶƚ͕ Figure 4: Feedback multiplexer component 2.3 CONTROLLER DESIGN &ŽƌƚŚĞĚĞƐŝŐŶŽĨƚŚĞĐŽŶƚƌŽůůĞƌ͕ĂĨŝŶŝƚĞƐƚĂƚĞŵĂĐŚŝŶĞǁĂƐƵƐĞĚƐŚŽǁŶŝŶ&ŝŐƵƌĞϱ͘
ĞƉĞŶĚŝŶŐŽŶĂƐĞƌŝĞƐŽĨŝŶƉƵƚƐŽƉ͕ŵ͕y͕zĂŶĚ͖ĂĚŝĨĨĞƌĞŶƚƐƚĂƚĞĐĂŶďĞŝŶǀŽŬĞĚƚŽƐĞƚƚŚĞ
5 ĂƉƉƌŽƉƌŝĂƚĞŝƚĞƌĂƚŝŽŶǀĂƌŝĂďůĞƐ T ;ƚŚĞƚĂͿ͕ G ;ĚĞůƚĂͿĂŶĚŬ;ƐĐĂůŝŶŐĨĂĐƚŽƌͿ͘,KZ/ŚĂƐϯ
ĚŝĨĨĞƌĞŶƚŵŽĚĞƐ͗ĐŝƌĐƵůĂƌ͕ůŝŶĞĂƌĂŶĚŚLJƉĞƌďŽůŝĐ͘/ƚĂůƐŽŚĂƐϮĚŝĨĨĞƌĞŶƚŽƉĞƌĂƚŝŽŶƐ͕ǀĞĐƚŽƌŝŶŐ
ĂŶĚƌŽƚĂƚŝŽŶ͘dŚƵƐƚŚĞƌĞĂƌĞĂƚŽƚĂůŽĨϲĚŝĨĨĞƌĞŶƚĐŽŶĨŝŐƵƌĂƚŝŽŶƐ͘dŚĞĐŽŶƚƌŽůůĞƌǁĂƐ
ŝŵƉůĞŵĞŶƚĞĚĂƐĂĨŝŶŝƚĞƐƚĂƚĞŵĂĐŚŝŶĞĂŶĚŝƚĐŽŶƐŝƐƚƐŽĨϭϰƐƚĂƚĞƐ͘ϴŽĨƚŚĞϭϰƐƚĂƚĞƐĂƌĞ
ƐŝŵƉůĞĂŶĚƚƌŝǀŝĂůĂƐĐŽƵůĚďĞƐĞĞŶŝŶƚŚĞĨŝŐƵƌĞďĞůŽǁ͕ Figure 5: State diagram for the Finite State Machine ^ƚĂƚĞϭ͗>ŝŶĞĂƌZŽƚĂƚŝŽŶ͘ ĂƐĞĚŽŶƚŚĞĚĞĨŝŶŝƚŝŽŶŽĨ>ŝŶĞĂƌƌŽƚĂƚŝŽŶ T ŝƐĞƋƵĂůƚŽ G ĂŶĚŬŝƐĞƋƵĂůƚŽϭ͘/ŶƚŚŝƐĐĂƐĞ
T с-­‐ǁŚŝĐŚŝƐĂĐŚŝĞǀĞĚŝŶŽŶĞŝƚĞƌĂƚŝŽŶĂŶĚƚŚƵƐƚŚĞƐŽůƵƚŝŽŶŝƐƚƌŝǀŝĂů͘ ^ƚĂƚĞϮĂŶĚϯ͗ŝƌĐƵůĂƌZŽƚĂƚŝŽŶLJϭĂŶĚ,LJƉĞƌďŽůŝĐZŽƚĂƚŝŽŶLJϭ͘ dŽůŝŵŝƚƚŚĞƐŝnjĞŽĨƚŚĞ>ŽŽŬ-­‐ƵƉƚĂďůĞ͕ T ŝƐƐĞƚƚŽϭ͕ G ƚŽƚĂŶ;ϭͿĂŶĚŬƚŽϭͬĐŽƐ;ϭͿ͕Žƌ G ƚŽ
ƚĂŶŚ;ϭͿĂŶĚŬƚŽϭͬĐŽƐŚ;ϭͿǁŚĞŶͮͮŝƐŐƌĞĂƚĞƌƚŚĂŶϭ͘ ^ƚĂƚĞϰ͗ZŽƚĂƚŝŽŶǁŝƚŚƐŵĂůůƚŚĞƚĂ͘ /ƚŝƐƐŚŽǁŶŝŶĂƉƉĞŶĚŝdžƚŚĂƚǁŚĞŶƚŚĞĞdžƉŽŶĞŶƚŝƐůĞƐƐƚŚĂŶŚĂůĨƚŚĞŶƵŵďĞƌŽĨďŝƚƐŝŶ
ƚŚĞŵĂŶƚŝƐƐĂ͕ƚĂŶ;ɽͿŽƌƚĂŶŚ;ɽͿŝƐĞƋƵĂůƚŽɽĂŶĚŶŽƚĂďůĞůŽŽŬ-­‐ƵƉŽƉĞƌĂƚŝŽŶŝƐƌĞƋƵŝƌĞĚ͘dŚƵƐ
ƚŚĞŽƵƚƉƵƚŽĨƚŚŝƐƐƚĂƚĞŝƐƚŚĞƐĂŵĞĂƐŝŶƚŚĞ>ŝŶĞĂƌZŽƚĂƚŝŽŶƐƚĂƚĞ͘ 6 ^ƚĂƚĞϱĂŶĚϲ͗ŝƌĐƵůĂƌZŽƚĂƚŝŽŶǁŝƚŚ>ŽŽŬ-­‐ƵƉƚĂďůĞĂŶĚ,LJƉĞƌďŽůŝĐZŽƚĂƚŝŽŶǁŝƚŚ>ŽŽŬ-­‐
ƵƉƚĂďůĞ ŽƚŚŽĨƚŚĞƐĞƐƚĂƚĞƐǁĞƌĞŝŵƉůĞŵĞŶƚĞĚĂƐĂƐŝŶŐůĞĐŽŵƉŽŶĞŶƚƚŽƌĞĚƵĐĞƚŚĞƐŝnjĞŽĨƚŚĞ
ŚĂƌĚǁĂƌĞ͘dŚĞĨŝŐƵƌĞďĞůŽǁƐŚŽǁƐƚŚĞƐƚƌƵĐƚƵƌĞŽĨƚŚŝƐĐŽŵƉŽŶĞŶƚ͕ Figure 6: Rotation with lookup table dŚĞŵĞŵŽƌLJƚĂďůĞĐŽŶƐŝƐƚƐŽĨǀĂůƵĞƐŽĨƚŚĞƉĂĚĚĞĚ T ĂŶĚƚŚĞĐŽƌƌĞƐƉŽŶĚŝŶŐƚĂŶ T ͕ƚĂŶŚ T ͕
ϭͬĐŽƐ T ͕ĂŶĚϭͬĐŽƐŚ T ͘dŚĞƉĂĚĚĞĚǀĂůƵĞŽĨƚŚĞƚĂŝƐĞƋƵĂůƚŽŝŶƐƵĐŚĂǁĂLJƚŚĂƚƚŚĞŝƌ
ĞdžƉŽŶĞŶƚƐĂƌĞĞƋƵĂůďƵƚƚŚĞŵĂŶƚŝƐƐĂŽĨƚŚĞƚĂŝƐĞƋƵĂůƚŽƚŚĞůĞĂĚŝŶŐ͞Ɛ͟ďŝƚƐŽĨŵĂŶƚŝƐƐĂ
ǁŝƚŚƚŚĞƌĞŵĂŝŶŝŶŐŵĂŶƚŝƐƐĂďŝƚƐƉĂĚĚĞĚƚŽnjĞƌŽ͘dŚŝƐŝƐĚŽŶĞƚŽůŝŵŝƚƚŚĞƐŝnjĞŽĨƚŚĞůŽŽŬ-­‐
ƵƉ
ƚĂďůĞ͘,ĞŶĐĞƚŚĞƐŝnjĞŽĨƚŚĞůŽŽŬ-­‐
ƵƉƚĂďůĞŝƐĞƋƵŝǀĂůĞŶƚƚŽƚŚĞϭϮĚŝĨĨĞƌĞŶƚĞdžƉŽŶĞŶƚƐĂŶĚƚŚĞ
ŶƵŵďĞƌŽĨ͞Ɛ͟ďŝƚƐŝŶƚŚĞŵĂŶƚŝƐƐĂƚŚĂƚĂƌĞƐĐĂŶŶĞĚŝŶƉĂƌĂůůĞů͘dŚĞŶƵŵďĞƌŽĨƚŚŽƐĞďŝƚƐ
ĐŽƵůĚďĞĚŝĨĨĞƌĞŶƚĚĞƉĞŶĚŝŶŐŽŶƚŚĞƌĞƋƵŝƌĞĚƐƉĞĞĚ͘dŚĞƐŝnjĞŽĨƚŚĞƚĂďůĞŝƐŐŝǀĞŶďLJƚŚĞ
ĞƋƵĂƚŝŽŶƐŚŽǁŶďĞůŽǁ͕ ܵ݅‫ ݈ܾ݁ܽݐ݂݋݁ݖ‬ൌ ሺ݊Τʹሻ ൈ ʹ௦ ǁŚĞƌĞŶ͗ŶƵŵďĞƌŽĨďŝƚƐŝŶƚŚĞŵĂŶƚŝƐƐĂĂŶĚ Ɛ͗ŶƵŵďĞƌŽĨďŝƚƐŝŶƚŚĞŵĂŶƚŝƐƐĂƚŽďĞƐĐĂŶŶĞĚŝŶƉĂƌĂůůĞů͘ dŚĞƚĂďůĞůŽŽŬ-­‐ƵƉŵĞĐŚĂŶŝƐŵǁĂƐĚĞƐŝŐŶĞĚƚŽŽƉƚŝŵŝnjĞƚŚĞŵĞŵŽƌLJĂĐĐĞƐƐƚŝŵĞ͘LJ
ĐĂƌĞĨƵůůLJĞdžĂŵŝŶŝŶŐƚŚĞϯϮ-­‐ďŝƚĨůŽĂƚŝŶŐƉŽŝŶƚŶƵŵďĞƌ͕ŝƚǁĂƐŶŽƚŝĐĞĚƚŚĂƚŽŶůLJƚŚĞůĂƚƚĞƌϰ-­‐
ďŝƚƐŝŶƚŚĞĞdžƉŽŶĞŶƚĂŶĚƚŚĞĨŝƌƐƚƐďŝƚƐŝŶƚŚĞŵĂŶƚŝƐƐĂǁŽƵůĚĐŚĂŶŐĞ͘&ŽƌĂϯϮďŝƚĨůŽĂƚŝŶŐ
ƉŽŝŶƚŶƵŵďĞƌƚŚĞĞdžƉŽŶĞŶƚƐŝŶƚŚĞůŽŽŬ-­‐ƵƉƚĂďůĞĂƌĞĨƌŽŵ-­‐ϭƚŽ-­‐ϭϮ͘ /ŶƚŚŝƐƉƌŽũĞĐƚƐǁĂƐĐŚŽƐĞŶƚŽďĞϰ͘,ĞŶĐĞƚŚĞďŝƚƐŝŶĚĞdžĞĚĨƌŽŵϮϯĚŽǁŶƚŽϭϵŝŶǁĞƌĞ
ƵƐĞĚ͘dŚĞůĞĂƐƚǀĂůƵĞŝƐ͞ϯϬ͟ŝŶŚĞdžĂĂŶĚƚŚĞŵĂdžŝŵƵŵŝƐ͞&͟ŝŶŚĞdžĂ͘dŚĞƚĂďůĞǁĂƐ
ĚĞƐŝŐŶĞĚƐŽƚŚĂƚ T ΗϯϵϴϬϬϬϬϬ͟ŝƐƉůĂĐĞĚĂƚƚŚĞďĞŐŝŶŶŝŶŐŽĨƚŚĞƚĂďůĞĂŶĚ͞ϯĨϳϴϬϬϬϬ͟ĂƚƚŚĞ
ĞŶĚ͘ dŽĨŝŶĚƚŚĞĂĚĚƌĞƐƐ͕ƚŚĞϮϯĚŽǁŶƚŽϭϵďŝƚƐŝŶĂƌĞƉĂƐƐĞĚƚŽƚŚĞŵŝŶƵƐϰϴĐŽŵƉŽŶĞŶƚ
ǁŚĞƌĞƚŚĞĂĚĚƌĞƐƐŝƐƌĞĚƵĐĞĚďLJdž͞ϯϬ͘͟dŚĞŶƚŚĞŽƵƚƉƵƚŝƐƐĞŶƚƚŽŵĞŵŽƌLJƚŽŐĞƚƚŚĞ
ĐŽƌƌĞƐƉŽŶĚŝŶŐŝŶĨŽƌŵĂƚŝŽŶ͘ 7 ^ƚĂƚĞϳ͗>ŝŶĞĂƌsĞĐƚŽƌŝŶŐ hŶůŝŬĞ>ŝŶĞĂƌZŽƚĂƚŝŽŶ͕ƚŚŝƐŝƐĂŶŽŶ-­‐ƚƌŝǀŝĂůƐƚĂƚĞĂŶĚŝƚĚĞƉĞŶĚƐŽŶĨŝŶĚŝŶŐɷƚŚƌŽƵŐŚƚĂďůĞ
ůŽŽŬ-­‐ƵƉ͕ĂŶĚƚŚĞŶɽŝƐĞƋƵĂůƚŽɷĂŶĚŬŝƐĞƋƵĂůƚŽϭ͘dŚĞƚĂďůĞĐŽŶƐŝƐƚƐŽĨƚŚĞŽƵƚƉƵƚŽĨƚŚĞ
ĚŝǀŝƐŝŽŶŽĨƚŚĞƚǁŽŶƵŵďĞƌƐ͕yĂŶĚz͕ǁŚĞƌĞƚŚĞĞdžƉŽŶĞŶƚƐŽĨƚŚĞƚǁŽŶƵŵďĞƌƐĂƌĞƐĞƚƚŽϭ
ĂŶĚǁĞƚĂŬĞƚŚĞĚŝĨĨĞƌĞŶƚĐŽŵďŝŶĂƚŝŽŶƐŽĨyĂŶĚz͞Ɛ͟ďŝƚƐŝŶƚŚĞŵĂŶƚŝƐƐĂ͘&ŝŐƵƌĞϳƐŚŽǁƐ
ŚŽǁƚŚĞƐƚĂƚĞǁĂƐŝŵƉůĞŵĞŶƚĞĚ͕ Figure 7: Linear vectoring dŚĞĨŝŶĚͺĂĚĚƌͺŽĨͺĚŝǀŝĚĞƌĐŽŵƉŽŶĞŶƚƐĐĂŶƐƚŚĞůĞĂĚŝŶŐƐďŝƚƐŽĨƚŚĞŵĂŶƚŝƐƐĂŝŶďŽƚŚzĂŶĚy
ĂŶĚŽƵƚƉƵƚƐƚŚĞĐŽƌƌĞƐƉŽŶĚŝŶŐĂĚĚƌĞƐƐ͘dŚĞĂĚĚƌĞƐƐŝƐƐŝŵƉůLJĨŽƵŶĚďLJĐŽŶĐĂƚĞŶĂƚŝŶŐƚŚĞ
͞Ɛ͟ďŝƚƐŽĨzĂŶĚƚŚĞ͞Ɛ͟ďŝƚƐŽĨy͘dŚĞĂĚĚƌĞƐƐŝƐƚŚĞŶƉĂƐƐĞĚƚŽƚŚĞŵĞŵŽƌLJĂŶĚƚŚĞ
ĐŽƌƌĞƐƉŽŶĚŝŶŐŶŽƌŵĂůŝnjĞĚŽƵƚƉƵƚŝƐƌĞƚƌŝĞǀĞĚĂŶĚƉĂƐƐĞĚƚŽƚŚĞĨŝŶĚͺƉĂĚĚĞĚͺŶƵŵďĞƌ
ĐŽŵƉŽŶĞŶƚǁŚĞƌĞƚŚĞĞdžƉŽŶĞŶƚŝƐĂĚĚĞĚƚŽ;zĞ-­‐yĞͿƚŽŐĞƚƚŚĞĨŝŶĂůሺܻΤܺሻ௦ ൈ ʹ௒೐ ି௑೐ ͘
,ĞŶĐĞ͕ƚŚĞƐŝnjĞŽĨƚŚĞƚĂďůĞŝƐʹ௦ ൈ ʹ௦ ͘/ŶƚŚŝƐƉƌŽũĞĐƚƐŝƐϰĂŶĚƚŚĞƐŝnjĞŽĨƚŚĞƚĂďůĞŝƐϮϱϲ͕
ĨŽƵŶĚďLJĞǀĂůƵĂƚŝŶŐʹସ ൈ ʹସ ͘ ^ƚĂƚĞϴĂŶĚϵ͗ŝƌĐƵůĂƌͺǀĞĐƚŽƌŝŶŐͺďLJͺŽŶĞĂŶĚ,LJƉĞƌďŽůŝĐͺǀĞĐƚŽƌŝŶŐͺďLJͺŽŶĞ /ŶŽƌĚĞƌƚŽůŝŵŝƚƚŚĞƐŝnjĞŽĨƚŚĞůŽŽŬ-­‐ƵƉƚĂďůĞ͕ɽ͕ɷ͕ĂŶĚŬĂƌĞƐĞƚĂƐ arctan(1) ͕ϭ͕ĂŶĚ
ϭͬĐŽƐ;ĂƌĐƚĂŶ;ϭͿͿƌĞƐƉĞĐƚŝǀĞůLJĨŽƌƐƚĂƚĞϴĂŶĚĂƌĐƚĂŶŚ;͘ϵϱͿ͕ϭ͕ĂŶĚϭͬĐŽƐŚ;ĂƌĐƚĂŶŚ;͘ϵϱͿͿ
ƌĞƐƉĞĐƚŝǀĞůLJĨŽƌƐƚĂƚĞϵ͘ ^ƚĂƚĞϭϬ͗ǀĞĐƚŽƌŝŶŐͺǁŝƚŚͺƐŵĂůůͺĨƌĂĐƚŝŽŶ /ƚŝƐƐŚŽǁŶŝŶĂƉƉĞŶĚŝdžƚŚĂƚǁŚĞŶƚŚĞĞdžƉŽŶĞŶƚŝƐůĞƐƐƚŚĂŶŚĂůĨƚŚĞŶƵŵďĞƌŽĨďŝƚƐŝŶƚŚĞ
ŵĂŶƚŝƐƐĂ͖ƚĂŶ;ɽͿŽƌƚĂŶŚ;ɽͿŝƐĞƋƵĂůƚŽɽĂŶĚŶŽƚĂďůĞůŽŽŬ-­‐ƵƉŽƉĞƌĂƚŝŽŶŝƐŶĞĞĚĞĚ͘dŚƵƐƚŚĞ
ŽƵƚƉƵƚŽĨƚŚŝƐƐƚĂƚĞŝƐƚŚĞƐĂŵĞĂƐƚŚĞŽŶĞŝŶƚŚĞ>ŝŶĞĂƌsĞĐƚŽƌŝŶŐƐƚĂƚĞǁŚĞŶzĞʹyĞф-­‐ϭϮ͘ ^ƚĂƚĞϭϭĂŶĚϭϮ͗ŝƌĐƵůĂƌsĞĐƚŽƌŝŶŐǁŝƚŚ>ŽŽŬ-­‐ƵƉƚĂďůĞĂŶĚ,LJƉĞƌďŽůŝĐsĞĐƚŽƌŝŶŐǁŝƚŚ>ŽŽŬ-­‐
ƵƉƚĂďůĞ ŽƚŚŽĨƚŚĞƐĞƐƚĂƚĞƐǁĞƌĞŝŵƉůĞŵĞŶƚĞĚĂƐĂƐŝŶŐůĞĐŽŵƉŽŶĞŶƚƚŽƌĞĚƵĐĞƚŚĞƐŝnjĞŽĨƚŚĞ
ŚĂƌĚǁĂƌĞ͘dŚĞĨŝŐƵƌĞďĞůŽǁƐŚŽǁƐƚŚĞƐƚƌƵĐƚƵƌĞŽĨƚŚŝƐĐŽŵƉŽŶĞŶƚ͕ 8 Figure 8: Vectoring with lookup table dŚĞŵĞŵŽƌLJĂĚĚƌĞƐƐŝƐĚĞƌŝǀĞĚďĂƐĞĚŽŶyĂŶĚz͖ŵŽƌĞƉƌĞĐŝƐĞůLJ͕ďĂƐĞĚŽŶƚŚĞůĞĂƐƚϰďŝƚƐŽĨ
ƚŚĞĞdžƉŽŶĞŶƚzĞ-­‐yĞĂŶĚƚŚĞƐďŝƚƐŽĨzĂŶĚyŵĂŶƚŝƐƐĂƐ͘,ĞŶĐĞƚŚĞƐŝnjĞŽĨƚŚĞƚĂďůĞ
ŝƐሺ݊Τʹሻ ൈ ʹ௦ ൈ ʹ௦ ͘/ŶƚŚŝƐĐĂƐĞ͕ƚŚĞƐĞƚǁŽƐƚĂƚĞƐƐǁĂƐĐŚŽƐĞŶƚŽďĞŽŶůLJƚǁŽŝŶŽƌĚĞƌƚŽ
ůŝŵŝƚƚŚĞƐŝnjĞŽĨƚŚĞƚĂďůĞ͘ 2.4 ALU DESIGNS There are four ALUs in the designͶeach one responsible for computing a single coordinate component, that is X,Y,Z and the scaling factor K. Since three of these ALUs (K, X, Y) require multiplication operation and three of them (X, Y, Z) requires the addition/subtraction, the ALU component is divided into the adder/subtraction and multiplier subcomponents. These subcomponents are then divided into behavioural modules for computing floating point arithmetic. Both these subcomponents were implemented in such a way that they are independent of each ALU so that they can be reused in the ALUs. Thus this design reduced the implementation time and made the debugging of the four ALUs much easier. 9 Figure 9: Structural diagram for X ALU The above diagram shows an ALU for computing the X component of the coordinate given byܺ௜ାଵ ൌ ܺ௜ ൅ ݉ߜ௜ ܻ௜ . It consists of three componentsͶmultiplier, adder and MBM (multiply By Mode) component. The MBM was not included as part of the multiplier so that the multiplier could be used in other ALUs. The design in figure 9 sacrificed performance over code reusability because when the mode (M) is equal to zero, the multiplier still computes the floating point multiplication of delta and Y. The clock signal is only fed into the component so that it can be tested and it does not infer a latch. The structural diagrams for multiplier and adder will be elaborated later. Table 2 shows the interface signal definition for this component. Table 2: Interface signals for the X_ALU SIGNAL Yi Delta_i Xi M MODE IN IN IN IN TYPE STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC_VECTOR Clock IN STD_LOGIC X_next OUT STD_LOGIC_VECTOR COMMENT 32 bit floating point number 32 bit floating point number 32 bit floating point number 2 bit number representing the mode(circular, hyperbolic, linear) Used for testing the component only 32 bit floating point number 10 Figure 10: Structural diagram for the Y ALU The above diagram shows the ALU for computing the Y component of the coordinate given byܻ௜ାଵ ൌ ܻ௜ െ ߜ௜ ܺ௜ . This component consists of multiplier, adder and an XOR gate for determining the sign of multiplying -­‐1 andߜ௜ ൈ ܺ௜ . Since the multiplication by -­‐1 is only required in the Y ALU, the logic for this multiplication was not included in the multiplier. The clock is also used for testing the ALU and its subcomponents. Table 3 displays the interface signal definition for this component. Table 3: Interface signals for the Y ALU SIGNAL Xi Yi Delta_i Clock Y_next MODE IN IN IN IN OUT TYPE STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC STD_LOGIC_VECTOR COMMENT 32 bit floating point number 32 bit floating point number 32 bit floating point number 32 bit floating point number 11 Figure 11: Structural diagram for the K ALU The above diagram shows a K ALU for computing the scaling factor given by‫ܭ‬௜ାଵ ൌ ‫ܭ‬௜ ൈ ݇௜ . It consists of the multiplier component only and the details of the multiplier will be discussed later. Table 4 below shows a signal definition for the K ALU. Table 4: Interface signals for the K ALU component SIGNAL Ki Ki Clock K_next MODE IN IN IN OUT TYPE STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC STD_LOGIC_VECTOR COMMENT 32-­‐bit floating point number 32-­‐bit floating point number 32-­‐bit floating point number 12 Figure 12: Structural diagram for the Z ALU The diagram above depicts an ALU for computing the accumulated angle, Z, given byܼ௜ାଵ ൌ ߠ௜ ൅ ܼ௜ . Z ALU is a floating point adder (described later in this section) which ƚĂŬĞƐƚŚĞŝǀĂůƵĞĂŶĚƚŚĞĐĂůĐƵůĂƚĞĚǀĂůƵĞŽĨʾi from the finite state machine and then add them together. The generated Zi+1 ALU output is the next Zi iterated value which is then feedback to the finite state machine via a Multiplexer. Table 5 below shows the signal definition associated with this component. Table 5: Interface signals for the Z ALU SIGNAL Zi Ⱥi Clock Z_next MODE IN IN IN OUT TYPE STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC STD_LOGIC_VECTOR COMMENT 32-­‐bit floating point number 32-­‐bit floating point number 32-­‐bit floating point number 2.4.1 MULTIPLIER COMPONENT The multiplication of two floating point numbers involves fixed point multiplication of mantissas and adding the exponents in fixed point format. Adding two biased exponents result in double biased results; thus in order to normalize the result it is required to subtract 127 from the results. There are two cases that need to be accounted for in multiplying two mantissas: 13 1) Multiplying two mantissas can result in a 1 bit less than the required result, for example multiplying 1.00 and 1.00 results in 1.0000 which is a 5-­‐bit number. In this case, the result is normalized and there is no need to shift. 2) Multiplying two mantissas can also result in an expected bit number, for example multiplying 1.10 and 1.11 results in 10.0011 which is a 6-­‐bit number. For this case the result is not normalized and the result needs to be shifted to the right once. Shifting the mantissas to right requires the result of adding exponents to be normalized by adding one. Figure 13 below summarises this logic Figure 13: Structural diagram for the multiplier The above diagram shows the multiplier for performing the multiplication of two floating point numbers. This component consists of two components, mult18x18s for computing fixed point multiplication of mantissas and Exponents for performing fixed point addition on the exponents. These sub components will be covered in detail later in this section. The clock signal is only used for testing this component. Table 6 below shows a signal interface definition of this component. 14 Table 6: Interface signals for the multiplier SIGNAL Ma Mb Ea Eb Sa Sb Clock Reg MODE IN IN IN IN IN IN IN OUT TYPE STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC STD_LOGIC STD_LOGIC STD_LOGIC_VECTOR COMMENT 17-­‐bit mantissa of a floating point number 17-­‐bit mantissa of a floating point number 8-­‐bit exponent of a floating point number 8-­‐bit exponent of a floating point number Sign of a floating point number Sign of a floating point number 32-­‐bit floating point number Figure 14: Structural diagram for the Exponents component The above diagram shows an Exponents component that computes the fixed point addition on the given exponents. dŚĞdžƉŽŶĞŶƚƐĐŽŵƉŽŶĞŶƚ͛ƐůŽŐŝĐŝƐĚŝǀŝĚĞĚŝŶƚŽƚǁŽ
subcomponents so that addition of exponents can execute in parallel with the mantissa multiplier in Figure 14. This design also simplifies the logic for dealing with the exponent addition. Clock is only used for testing the subcomponents. Signal definition for this component is shown in table 7. Table 7: Interface signals for the Exponents component SIGNAL Ea Eb Shift MODE IN IN IN TYPE STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC Clock Ef IN OUT STD_LOGIC STD_LOGIC_VECTOR COMMENT 8-­‐bit exponent of a floating point number 8-­‐bit exponent of a floating point number Flag for determining whether the exponent need to be normalised 8-­‐bit exponent result of a floating point number 15 Figure 15: Component diagram of a fixed point 18 x 18 bit multiplier The ĐŽŵƉŽŶĞŶƚ͛ƐĨƵŶĐƚion is to compute the fixed point multiplication of the mantissas. It takes two 17-­‐bit mantissas instead of the normal 23 bit mantissa because the Xilinx soft core multiplier can only perform an 18x18 bit multiplication. Also since the output is a 23 bit mantissa and the result of multiplying two 18-­‐bit mantissas is 36-­‐bits, the lower significant bits of each mantissas being multiplied are truncated in the output. Thus the need to multiply 23 bit mantissas will waste the processor time and resources. This component accepts 17-­‐bit number instead of 18 because the implied 1 is padded to the two mantissas before the multiplication is performed on the mantissas. The shift signal is assigned to the 36th bit which only goes low when the result is 1 bit less than the expected result (in this case 35 bits). The signal definition for this low level component is described below in table 8. Table 8: Interface signals for the multiplier (mult18x18s) SIGNAL Ma Mb Clk Mr Shift MODE IN IN IN OUT OUT TYPE STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC STD_LOGIC_VECTOR STD_LOGIC_VECTOR COMMENT 17-­‐bit mantissa 17-­‐bit mantissa 23-­‐bit mantissa Signal set to high when there is need to shift the mantissa 16 Figure 16: Component diagram for the exponent adder The component diagram in figure 15͛ƐĨƵŶĐƚŝŽŶŝƐƚŽĂĚĚƚŽƚǁŽĞdžƉŽŶĞŶƚƐĂŶĚĐŚĞĐŬs whether the result falls in the correct range (-­‐127 to 255) otherwise if the biased exponent results falls below zero, E is assigned to zero aŶĚŝĨŝƚ͛ƐŽǀĞƌϮϱϱ͕ŝƐĂƐƐŝŐŶĞĚϮϱϱ͘dŚŝƐ
component is implemented using a procedure to execute the sequential statements for dealing with the exponent range and its signal interface definition is shown table 9. Table 9: Interface signals for the exponent adder SIGNAL Ea Eb Clk E MODE IN IN IN OUT TYPE STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC STD_LOGIC_VECTOR COMMENT 8-­‐bit exponent of a floating point number 8-­‐bit exponent of a floating point number 8-­‐bit exponent result Figure 17: Component diagram for normalizing the exponent The component in figure 16͛ƐĨƵŶĐƚŝŽŶŝƐƚŽŶŽƌŵĂůŝnjĞƚŚĞĞdžƉŽŶĞŶƚďLJĞŝƚŚĞƌĂĚĚŝŶŐϭƚŽ
the exponent, E, if the shift signal is high. This component͛ƐĂƌĐŚŝƚĞĐƚƵƌĞŝƐĂůso behavioural 17 and its sequential statements are defined in a procedure to avoid having to synchronize the top level components with the clock. The clock signal is only used to test this component and the signal definition is described below in table 10. Table 10: Interface signal for the exponent normalizer SIGNAL E Shift Clk Ef MODE TYPE COMMENT IN STD_LOGIC_VECTOR 8-­‐bit exponent IN STD_LOGIC Bit set to determine when to add 1 to the exponent IN STD_LOGIC OUT STD_LOGIC_VECTOR Normalized 8-­‐bit exponent 2.4.2 FLOATING POINT ADDER COMPONENT Figure 18: Component diagram for the floating point adder The algorithm for this component is described as follows: The algorithm for this component is described as follows and its signal definition is described in table 11. The IEEE754 Addition/Subtraction floating point algorithms and implementation was considered for this project. The addition and subtraction are realised in following steps. a) Let the X and Y be the operands and represent by (Sx, Mx,Ex) and (Sy, My,Ey). b) Subtract the exponents (d= Ex-­‐Ey). c) Align significands which consists of the following: x Shift right (d) positions the significand of the operand with the smallest exponent. x Select as the exponent of the result the largest exponent. d) Add (Subtract) significand and produce sign of the result. This is a signed addition an d the effective operation (add or subtract) is determined by the floating point operation and the sign of the operand as follow e) 18 Floating Point Operation Sign of Operands Effective Operation (EOP) ADD Equal Add ADD Different Subtract SUBTRACT Equal Subtract SUBTRACT Different Add f) Normalisation of the result: Three situations can occur: x The result is already normalised. x When the effective operation is an addition, there might be an overflow of the significand. x When the effective operation is subtraction, the result might have leading zeros. g) Perform the rounding according to the specific mode. If an overflow occurs because of the addition, it is necessary to normalise by the right shift and increment the exponent. If the effective operation is subtraction, then effect might have leading zeros which then shift left the significand by a number of positions corresponding to the number of leading zeros and decrement the exponent by the number of leading zeros. Figure 12 below summarises Addition/ Subtraction algorithm implemented in VHDL 19 Figure 19: State chart for performing the floating point addition/subtraction 20 Table 11: Interface signal for the floating point adder SIGNAL Ma Mb Ea Eb Sa Sb clk Mr E S MODE IN IN IN IN IN IN IN OUT OUT OUT TYPE STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC STD_LOGIC STD_LOGIC STD_LOGIC_VECTOR STD_LOGIC_VECTOR STD_LOGIC COMMENT 23-­‐bit mantissa 23-­‐bit mantissa 8-­‐bit exponent 8-­‐bit exponent Sign of floating point number Sign of floating point number 23-­‐bit mantissa 8-­‐bit exponent Final sign 3.0 TESTING PROCEDURE For this project, a test-­‐driven development methodology was followed. Debugging integrated code is not a preferred task; it is time consuming and it should be avoided. In the development period, each module was fully tested to verify its functionality. Systematically, as the components were integrated, the resulting component was fully tested. For example figure 20 shows the timing diagram of HYPERBOLIC_ROTATION_BY_ONE. To validate the functionality of this module we gave the following input ͺ/Eфсy͞&ϴϬϬϬϬϬ͟ĂŶĚƚŚĞ
expected output was as follows: THETA_OUT <= temp& b"011" &X"F800000"; DELTA_OUT<= X"3f42f7d6"; -­‐-­‐tanh (1) in float K_OUT<= X"3f25e6e3"; -­‐-­‐ 1/cosh (1) in float After completing the test for each small module, the module(s) were integrated with the ALU and tested. The simulation result is shown in below in figure 15: Figure 20: Simulation results for hyperbolic rotation by one state 21 3.1 ALU COMPONENTS TESTING To ensure that the ALU components worked correctly, the testing was first performed on the lower level modules. Then the upper level components instantiating the lower level modules were tested by first checking whether the boundary test cases, which are the test cases out of the allowed range, passed. Then the simple test cases like adding a zero to a number and multiplying a number by 1 were executed on all the structural components. After ensuring that components instantiated in the ALUs were working properly, a sample of test cases like subtracting a big number from a small number, adding small numbers were executed on the ALUs and then confirmed by ensuring that the values of x, y, z and k changed. 3.2 CONTROLLER TESTING The controller was tested by verifying that the right iterative variables were produced for the corresponding input combination. Once the FSM was tested and proved to be working, the ALUs were integrated and the result was vigorously tested for each of the six different configurations of the machine. The results in the tables below were obtained from the MODELSIM simulator. The numbers were later analyzed to the output from a CORDIC floating point C programme found in Appendix B. Table 12: Linear vectoring Table 13: Hyperbolic rotation SIMULATED ITERATIVE RESULTS FOR Hyperbolic Rotation KƉс͚ϭ͛ DŽĚĞс͚ϭϭ͛ Iter# Actual Values Calculated Padded Values X Y Z X Y Z DELTA 0 1 0 1 1 0 1 0.761594156 1 1 0.7615928649902344 -­‐4.6566129e-­‐10 1 0.761594156 0 0.761594156 22 3.3 OVERALL TESTING (HCORDIC) The overall design testing was done the same way as in the controller testing section except that the iterative results were automatically by the HCORDIC design. The results are not included in this section because they are the same as the ones in section 3.2 and 4.0. 4.0 RESULTS The design was synthesized successfully and the result appeared on SPARTAN3 FPGA board. We presented the result by connecting 32 LED lights to the FPGA board to represent a 32 bit floating point number. We observed the different three outputs by using the switches to switch between each of the outputs. The iterations were triggered by using the push down button on the board. The results from the board were carefully observed and the tables were built. The result was examined against the classic floating-­‐point CORDIC whose C code is found in appendix B. Table 14: Linear vectoring simulation results Linear Vectoring DŽĚĞс͚Ϭϭ͛ KƉс͚Ϭ͛ CORDIC Values ITERATION Actual Values X Y Z X Y 0 3 1 0 1 3 0.040000916 0.32000002 2 3 0.002500951 0.33250001 3 3 0.000157204 0.33250001 4 3 0.000157204 0.33328125 5 3 1.07E-­‐05 0.33333007 6 3 6.49E-­‐07 0.3333334 0 7 3 4.77E-­‐08 0.33333358 3 Z % error 0.000025% Table 15: Circular vectoring simulation results Circular Vectoring KƉс͚Ϭ͛ ITERATION Actual Values X 0 3 1 3.2857132 2 3.29082274 3 3.29098582 4 3.29098916 5 3.29098916 6 3.29098916 7 3.29098916 scaled 3.156282707 Y Z 1 0 0.142860 0.2782997 0.025514 0.3139987 0.001556 0.3206950 0.000263 0.3220378 1.09E-­‐05 0.32198495 1.16E-­‐05 0.32199568 1.56E-­‐05 0.32199585 -­‐ .32199585 DŽĚĞс͚ϬϬ͛ Cordic Values X 3.1622782243 Y -­‐ Z .3217506 % Error .18% 23 Table 16: Simulated results for circular rotation Circular Rotation KƉс͚ϭ͛ ITERATION Actual Values X Y 0 2 0 1 2 1.09259796 2 1.9743876 1.13948154 3 1.9742041 1.13979971 1.732546 1 scaled Z 0.52359867 0.02359867 0.00016117 -­‐5.68E-­‐14 -­‐ DŽĚĞс͚Ϭϭ͛ Calculated Padded Values X Y Z DELTA 1.7320 1 -­‐ .03% 5.0 ANALYSIS Our numbers indicate that the final values of floating point were very close to ones for the classical CORDIC. The slight difference could be because of the problems that could be in the ALU component or in the scaling factor lookup tables. This is because the difference is extremely low for the linear mode and a little bit higher for the operation that requires lookup tables. Based on our synthesis, the longest path time was 20 ns. Hence an equivalent timing constraint was applied. It is also worth mentioning that the design failed to completely synchronize between the different stages. The initial attempt was made to have a different clock for the finite state machine that is faster than the other components. All the attempts were unsuccessful in the time period that was provided. Hence synchronisation was established at the multiplexer to be sensitive to a signal triggered manually. As for the FPGA board, based on the design summary, the total number of slices used were 2343(27%), number of flip flop were 642(3%), the memory utilisation was 3% and power consumption was 0.179 Watts. 6.0 CONCLUSION Based on the results, it was realised that the HCORDIC is very reliable and more accurate compare to standard Cordic. In this project it was shown that the performance has increases t by at least 55%. Increased performance would be achieved by choosing larger s (number of leading bits in the mantissa to be scanned in parallel). 7.0 RECOMMENDATIONS In general the project was a success in terms of implementation and performance but the synthesis time could be reduced by hard coding the look up table in memory. Secondly the project could have been better demonstrated if the results were dynamically presented on the computer screen rather LEDs on the FPGA board. 24 REFERENCES [1] Jack. E. Volder. The Cordic trigonometric computing technique. IRE Transactions on Electronic Computers, Pages 330-­‐334.September 1959. [2] J.S Walther. A unified algorithm for elementary functions. Proceedings of the AFIPS Spring Joint Computer Conference. Pages 379-­‐385. 1971 [3] Dr F.Gebali. HCordic: A High Performance Adaptive Cordic Algorithm. Appendix A Look-­‐Up Table A look-­‐ƵƉƚĂďůĞǁĂƐƌĞƋƵŝƌĞĚƚŽĞǀĂůƵĂƚĞʾĨƌŽŵ߲ in the vectoring modes and another table to evaluate߲ ĨƌŽŵʾŝŶƚŚĞƌŽƚĂƚŝŽŶŵŽĚĞ͘dŚĞƌĞŝƐŶŽŶĞĞĚ to store very small values of ͞ƚĂŶʾ͟ĂŶĚ͞ƚĂŶŚʾ͟ďĂƐĞĚŽŶƚŚĞdĂŝůŽƌƐĞƌŝĞƐĞdžƉĂŶƐŝŽŶ ‫ ߠ݊ܽݐ‬ൎ ߠ݂݅
ߠଶ
൑ ͳ ͵
For the infinite precision, this condition is true when ߠ ൏ ξ͵ ൈ ʹି௡Ȁଶ For a floating point data, wĞĐĂŶƐĂĨĞůLJĂƐƐƵŵĞƚŚĂƚ͞ƚĂŶʾсʾ͟ǁŚĞŶ ఏ
ʹ ൈ ʹ ௘ ൏ ξ͵ ൈ ʹି௡Ȁଶ ఏ
ʹ ௘ ൏ ʹି௡Ȁଶ The above conclusion applies to the hyperbolic mode too since the tanh function has a Taylor Series expansion similar to that of the tan function. HCordic requires the following look-­‐up tables: ௡
1-­‐ ߲evaluation: Table length=ʹଶ௦ ൅ ሺ ଶ ൅ ͳሻʹ௦ାଵ ௡
2-­‐ ʾ evaluation : Table length= ሺ ଶ ൅ ͳሻʹ௦ାଵ ௡
3-­‐ ݇update: Table length= ሺ ଶ ൅ ͳሻʹ௦ାଵ Appendix B /************************ Cordic classical algorithm *************************/ #include <math.h> #include <stdio.h> 25 #include <stdlib.h> #define tableSize 25 // number of bits in the mantessa. #define mode 0 // 1: circular 0: linear -­1: hyperbolic #define operation 1 // 1 is rotation and 0 is vector struct mynode { float km;; float delta_i;; float theta_i;; struct mynode* next;; };; typedef struct mynode node;; struct mytable{ node* header;; node* tail;; int size;; };; typedef struct mytable table;; /*******************************************************************
************* Creates tableSize nodes each for iteration i. Each node holds delta, theta, and k ********************************************************************
*************/ table* makeTable(int table_size) { table* myTable = (table *) malloc(sizeof(table));; myTable-­>size = table_size;; myTable-­>header = NULL;; myTable-­>tail = NULL;; int i =0;; // iteration variable float prev_k = 1;; // accumelative scaling factor myTable-­>header = (node *) malloc(sizeof(node));; node* current = myTable-­>header;; while(1) { current-­>delta_i = (pow(2,(-­1*i)));; if ( mode == 1 ) { // circular current-­>theta_i = atan(current-­>delta_i);; current-­>km = (prev_k * sqrt(1+mode*(pow(current-­
>delta_i,2))));; } else if ( mode == 0 ) { // linear current-­>theta_i = current-­>delta_i;; current-­>km = 1;; } else if ( mode == -­1) { // hyperbolic current-­>theta_i = atanh(current-­>delta_i);; current-­>km = (prev_k * sqrt(1+mode*(pow(current-­
>delta_i,2))));; /* check if this is true */ } prev_k = current-­>km;; i++;; 26 current-­>next = NULL;; if(i==myTable-­>size) break;; current-­>next = (node *)malloc(sizeof(node));; current = current-­>next;; } /*End of while*/ return myTable;; } void printTable(table* myTable) { int i = 0;; node* current = myTable-­>header;; FILE *fp;; fp=fopen("/home/mouhannad/Desktop/cordic/table.txt", "w");; fprintf(fp, "iter# \t delta_i \t\t\t\t\t\t\t theta_i \t\t\t\t\t\t\t Km \n");; while (current) { fprintf(fp, "%d \t\t %.9f \t\t %.9f \t\t %.9f \n" , i, current-­>delta_i, current-­>theta_i, current-­>km);; current = current-­>next;; i++;; } fclose(fp);; } float divide_By_two_tothe_i(float value,int i) { if(value != 0) { int* b = (int*)(&value);; int temp = *b & 0x807fffff;; *b = (*b >> 23) & 0xff;; *b = *b -­ i;; *b = *b<<23 | temp;; } return value;; } float* calculate(float x, float y, float z, node* current,int i) {
int mu;; float xi,yi,zi;; if(operation == 1) { // rotation if (z >= 0) mu = -­1;; else mu = 1;; } else if(operation == 0) { // vector if (y >= 0) mu = 1;; else mu = -­1;; } xi = x + mode*mu*divide_By_two_tothe_i(y,i);; yi = y -­ mu*divide_By_two_tothe_i(x,i);; zi = z + mu*current-­>theta_i;; float *values = (float *) malloc(3*sizeof(float));; *values = xi;; *(values+1) = yi;; *(values+2) = zi;; return values;; } void destroyTable(table* myTable) { node* current = myTable-­>header;; 27 node* next;; while (current) { next = current-­>next;; free(current);; current = next;; } free(myTable);; } /**** MAIN *****/ int main() { float x = 2;; float y = 0;; float z = 0.5235987305641174;; // initialize the values int i = 0;; table* myTable = makeTable(tableSize);; printTable(myTable);; node* current = myTable-­>header-­>next;; FILE* fp;; fp=fopen("/home/mouhannad/Desktop/cordic/output.txt", "w");; fprintf(fp, "iter#(i) \t [-­-­X-­-­] \t\t\t\t\t\t\t [-­-­Y-­-­] \t\t\t\t\t\t\t [-­-­Z-­-­] \n");; while( current) { fprintf(fp, "%d \t\t %.9f \t\t %.9f \t\t %.9f \n" , i, x, y, z);; float* values = calculate(x,y,z,current, i);; x = *values;; y = *(values+1);; z = *(values+2);; free(values);; current = current-­>next;; i++;; } fclose(fp);; destroyTable(myTable);; return 1;; } 28