NON-UNIFORM QUASI-LINEAR INTERPOLATION FOR DIRECT

NON-UNIFORM QUASI-LINEAR INTERPOLATION FOR DIRECT
DIGITAL FREQUENCY SYNTHESIZERS
_______________
A Thesis
Presented to the
Faculty of
San Diego State University
_______________
In Partial Fulfillment
of the Requirements for the Degree
Master of Science
in
Electrical Engineering
_______________
by
Eric Anthony Nunziato
Fall 2012
iii
Copyright © 2012
by
Eric Anthony Nunziato
All Rights Reserved
iv
DEDICATION
To My Family.
v
True greatness is when your name is like ampere, watt, and fourier - when it’s spelled in
lower case letter.
-Richard Hamming
vi
ABSTRACT OF THE THESIS
Non-Uniform Quasi-Linear Interpolation for Direct Digital
Frequency Synthesizers
by
Eric Anthony Nunziato
Master of Science in Electrical Engineering
San Diego State University, 2012
Direct Digital Frequency Synthesizers provide a means to produce sinusoidal
waveforms from a single source signal, the system clock. They yield an extremely efficient
solution to creating a waveform or waveforms of high spectral purity and high resolution.
The designs can be found in many digital communication and radar systems
This thesis investigates some of the state-of-the-art methods for implementing Direct
Digital Frequency Synthesizers (DDFS) via polynomial interpolation over segmented phase
values. Introduced is a novel DDFS interpolated by a quasi-linear polynomial over nonuniform segmentation. This is done by breaking the cosine wave into segments with nonuniform lengths. Each segment is approximated by either a linear or depressed parabolic
polynomial. More specifically, when the sinusoid is at a max, a depressed parabola is used to
model the shape rather than linear segments. This has reduced the approximation error
compared to pure linear interpolation. The next step is to implement the design in hardware
to evaluate its performance (power consumption and speed). The output spectrum is analyzed
and the variable of interest (to be maximized) is the Spurious Free Dynamic Range (SFDR)
or ratio of the fundamental harmonic amplitude to that of the next greatest harmonic. The
SFDR is calculated and designed for in MATLAB. The system is then modeled using
Simulink where the SFDR of the output waveform is confirmed. The design is then
implemented in both forms of electronic design: FPGA and ASIC. The modeled system is
converted to Verilog hardware description language and implemented on FPGA using the
Xilinx ISE Design Suite 13.4 and the Virtex-5 design kit. Additionally, the system is
implemented for ASIC using Cadence Encounter to analyze chip size and power
requirements.
vii
TABLE OF CONTENTS
PAGE
ABSTRACT ............................................................................................................................. vi
LIST OF TABLES ................................................................................................................. viii
LIST OF FIGURES ................................................................................................................. ix
GLOSSARY ..............................................................................................................................x
ACKNOWLEDGEMENTS ..................................................................................................... xi
CHAPTER
1
INTRODUCTION .........................................................................................................1 2
LITERATURE SURVEY OF INTERPOLATION METHODS ...................................5 2.1 Polynomial Interpolation ...................................................................................5 2.1.1 Linear Interpolation ..................................................................................6 2.1.2 Second-Order Interpolation ......................................................................7 2.1.3 Third-Order Interpolation .........................................................................8 2.2 Quasi-Linear Interpolation Method ...................................................................9 2.3 Non-uniform Piecewise-Linear Interpolation ..................................................14 3
NON-UNIFORM QUASI-LINEAR DESIGN ............................................................18 3.1 SFDR Calculation and Coefficient Generation................................................18 3.2 Design Modeling..............................................................................................22 3.3 Synthesis and Analysis ....................................................................................22 4
CONCLUSION AND FUTURE WORK ....................................................................26 REFERENCES ........................................................................................................................28
APPENDIX
A MATLAB DDFS FUNCTION ....................................................................................30 B VERILOG HDL IMPLEMENTATION ......................................................................32 viii
LIST OF TABLES
PAGE
Table 3.1. Coefficients Calculated by Convex Optimization ..................................................21 Table 3.2. Implementation Coefficients...................................................................................22 Table 3.3. Design Comparison.................................................................................................25 ix
LIST OF FIGURES
PAGE
Figure 1.1. High-level DDFS block diagram. ............................................................................1 Figure 2.1. DDFS using linear interpolation. .............................................................................6 Figure 2.2. Single phase multiplierless DDFS architecture. ......................................................8 Figure 2.3. Quasi-linear interpolation design. .........................................................................13 Figure 2.4. Non-uniform segmentation. 32 uniform segments reduced to 17. ........................15 Figure 2.5. Non-uniform segmentation. ...................................................................................16 Figure 2.6. Multiplierless architecture employing non-uniform interpolation. .......................17 Figure 3.1. Quasi-linear non-uniform segmentation. ...............................................................18 Figure 3.2. Half wave sinusoid with translation (left) and without (right). .............................21 Figure 3.3. Non-uniform quasi linear interpolated DDFS schematic. .....................................23 Figure 3.4. System output (top); PSM output (middle); overflowing phase
accumulator output (bottom)........................................................................................24 x
GLOSSARY
DDFS
Direct Digital Frequency Synthesizer
PSM
Phase to Sine Mapper
DAC
Digital to Analog Converter
SNR
Signal to Noise Ratio
MAE
Maximum Absolute Error
SFDR
Spurious Free Dynamic Range
LUT
Look-Up Table
ROM
Read Only Memory
CORDIC
Coordinate Rotation Digital Computer
MSB
Most Significant Bit
HDL
Hardware Description Language
RTL
Register Transfer Logic
xi
ACKNOWLEDGEMENTS
I would like to first and foremost thank Dr. Ashkan Ashrafi for mentoring me and
working so closely on this research. Without his close guidance and knowledgeof DSP, this
thesis would not have been possible.
In addition to my advisor, it is a pleasure to thank the other members of my thesis
committee: Dr. AmirhosseinAlimohammad and Dr. Christopher Paolini. I greatly appreciate
their time and effort in contributing to my education.
My sincere thanks goes to Dr. fredharris, whose teachings on DSP topics have made
me a better engineer. His moral support and help with schematic design have been
invaluable.
I would also like to take this opportunity to thank fellow student, Mark Manalo, for
his time and especially for his instructive tutorials on the Virtex 5 FPGA board.
I extend sincere gratitude to my wife Carolyn for standing by me while conducting
this research and for her support while I spent countless hours in the DSP lab.
Last but not least, I would like to thank my parents Robert and Kim for instilling the
values I hold today.
1
CHAPTER 1
INTRODUCTION
Since the modern computing age, starting with the introduction of the mainframe
computer in the 1940s, there has been a need for a method to produce a waveform, or
multiple waveforms, of high resolution and ease of transition between frequencies. The
solution to this problem has been the use of Direct Digital Frequency Synthesizers to create a
stable, low cost (low power and small chip area) sinusoidal signal with high resolution and
spectral purity. In the early 1970s, the idea of the Direct Digital Frequency Synthesizer
(DDFS) was introduced in [1]. The design and subsequently all future designs consist of
three main blocks: the phase accumulator (PA); the phase to sine mapper (PSM); and the
digital to analog converter (DAC). As the name suggests, the phase accumulator dictates the
phase of the output sinusoid. The output of PSM, a function of the input phase, is then
converted by the DAC if an analog output is desired. A high-level representation of a DDFS
is shown in Figure 1.1 [2].
Figure 1.1. High-level DDFS block diagram. Source: A. Ashrafi, R. Adhami,
and A. Milenkovic, “A direct digital frequency synthesizer based on the
Quasi-linear interpolation method,” IEEE Trans. Circuit Syst. I, Reg. Papers,
vol. 57, no. 4, pp. 863–872, Apr. 2010.
In Figure 1.1, the parameters L, W and D represent the accumulator size, the
accumulator output wordlength, and the system output wordlength, respectively. The
frequency of the sinusoid at the output is given as
2
(1.1)
2
where
is the system clock frequency and the frequency control word,
, controls the
output frequency of the sinusoid as the name implies as well as the resolution.
The PA consists of a register and adder which are used to sum the constant value of
and the previous value stored in the register. This results in a triangle signal as the input to
the PSM. The contents stored in the accumulator determine the phase angle of the output
sinusoid for the next stage.
The output of the PA is then truncated from L to W bits to reduce the size of the
architecture. Many previous implementations used a look-up table or read-only memory to
store the amplitudes of given sinusoids to achieve the phase-to-sine mapping. The contents in
the accumulator would provide the address in the look-up table which housed the amplitude
of the sinusoid at that phase angle. These have proven to be costly with respect to taking up a
large amount chip. In the recent decade, great advancements have been made to calculate the
amplitudes rather than storing them. While reducing the size of the architecture and
increasing spectral purity, power consumption is slightly increased.This thesis will focus
only on arithmetical PSMs utilizing segmented polynomial interpolation to calculate the
output sinusoid’s amplitude for a given input .
In order to validate the efficiency of a design, there are three parameters which are
used: signal to noise ratio (SNR); maximum absolute error (MAE); and Spurious Free
Dynamic Range (SFDR) [3]. The SNR is the ratio of the total signal power to that of the
output noise, MAE is the greatest difference between the output signal and an ideal sinusoid
and the SFDR is ratio of the fundamental harmonic to the absolute value of the greatest
harmonic out of all others.A strategy to minimize the error of a generated sinusoid is to
maximize the SFDR. As in [4] and [2, 5, 6], the motivation in this thesis is to maximize the
SFDR. The measurement SFDR has unitsdBc, decibels relative to the carrier, and is limited
by the chosen architecture as well as quantization of the digits used in calculations and
therefore is bounded by a maximum value.
The aforementioned or ‘ROM based’ PSM design [1, 7], is the first of three general
categories of DDFSs. ‘Angle rotation’ [8-10], being the second category while the third, and
the category which this thesis is concerned, is ‘polynomial interpolation’ [2-6].
3
In ROM based designs, the values of the output sinusoid are stored in a Look-Up
Table (LUT) housed in Read Only Memory (ROM). At any given time the current value of
the accumulator provides the reference address to the corresponding output amplitude. The
values in the LUT are calculated in advance and depending on how much precision is
required, can result in large ROMs being required. There have been advancements made in
reducing the size of the ROM, shown in [7] and analyzed in [11]. The output of the
accumulator is truncated to reduce the complexity of the architecture. Most, if not all, recent
designs [2-6, 8-10], adopt this technique. As noted in [2], truncating the accumulator output
in this manner reduces the SFDR significantly making it necessary to optimize the system so
that the actual SFDR is closer to the upper bound.Designs consisting of an angular rotation
PSM, described in [8] and [9] and summarized in [10], use the CORDIC (Coordinate
Rotation Digital Computer) algorithm to calculate the amplitude of the output sinusoid. The
algorithm uses the relationships (or conversion calculations) between polar and rectangular
coordinate systems to produce trigonometric functions. This method proves to be effective in
that the calculations are reduced to binary shifts and therefore the system is multiplierless.
The CORDIC method is improved in [12]. A method is presented for correction of the error
caused bythe method in [7], when the output of the accumulator is truncated. The method
utilizes the unused accumulator bits after truncation to add gain to the output of the PSM.
The CORDIC algorithm relies on comparison between current and previous output value and
the sign detection involved in this calculation provides the boundary for high-speed
implementations. Thus another type of calculations has been developed.
The final category of phase to sine mapper is calculation by polynomial interpolation.
In this method, the phase of the sinusoid is broken into segments. The output sinusoid’s
amplitude is calculated by an n-degree polynomial that is a function of the phase. The
coefficients of the polynomial vary by segment. The degree of the interpolating polynomial
can vary depending on how complex the desired architecture should be. The interpolating
polynomial can be first degree (linear) [4, 6], second degree (parabolic) [13, 14], variable
degree (polynomial) [3, 5], or a combination of first and second order [2]. This method has
proven to be the most effective in reducing architecture size with only slight degradations of
the SFDR
4
Irrespective of which method is used, applications for DDFS span a wide range of
electronics and communications systems. An example of a specific application is in software
defined radios, examined in [15] and [16]. The DDFS, or Direct Digital Synthesizer (DDS)
as described in industry, provides the waveform that is transmitted at the desired frequency.
Similar to radios, RADAR systems utilize DDFS chips. In [17] and [18], the DDFS provides
the initial FM signal that is then modified specific to the application and transmitted as well.
The chapters of this thesis are organized in the following manner. Chapter 2 discusses
threetechniques providing a background on the novel approach in this thesis. Chapter 3 is
dedicated to the design of this thesis, quasi-linear, non-uniform interpolation, providing a
thorough explanation of the technique, design modeling and analysis of the system. Lastly,
Chapter 4 concludes this thesis while providing topics for future developments.
5
CHAPTER 2
LITERATURE SURVEY OF INTERPOLATION
METHODS
The papers cited in this thesis all use segmentation, either uniform or non-uniform, to
interpolate the amplitude of the output sinusoid. In uniform segmentation, the first quadrant
is divided intoa number segments equal in length. The first quadrant of a sine wave is shown
in [4] as
sin
The number of segments
0
1.
(2.1)
2 , where urepresents an integerto simplify the
architecture.Then the segment length
1
since the length of each segment is equal [3].
A technique to reduce the complexity of the architecture is to exploit the quarter wave
symmetry of a sinusoidal wave. The amplitudes of the first quadrantcan be used to find those
of the other three with basic manipulation. The second quadrant of a sine wave can be
formed by reflecting the first quadrant over the line
. The third and fourth quadrants can
be formed in the same manner.This reduced the number of entries in the look-up table to onefourth the original allowing the use of a smaller ROM to store the coefficients. For the
methods of interpolation, this technique will require the calculations use a smaller number of
bits.
This chapter details three specific types of interpolated DDFSs:Interpolation using
uniformly segmented polynomials, quasi-linear (first and second order) interpolation over
uniform segmentation and linear interpolation over non-uniform segmentation.
2.1 POLYNOMIAL INTERPOLATION
The following section on uniform polynomial interpolation is divided in three
subsections: Linear interpolation, second-order interpolation and third-order interpolation.
6
2.1.1 Linear Interpolation
The first method of PSM is the linear interpolation over segmentation proposed by
Langlois and Al-Khalili in [4]. The first quadrant is estimated with the interpolating
polynomial
,
(2.2)
where
while
and
;
1,2,3 …
1
represent the respective initial amplitudesandslopes for each
segment,k. Figure 2.1 shows the architecture of [4] using linearinterpolation. After truncating
the accumulator output
, where following [7] dictates
4 , the first 2 Most
Significant Bits (MSBs) are used to reconstruct the output using quarter wave symmetry also
known as format conversion. The remaining M-2 bits represent a fixed point binary number,
the quarter wave phase ∈ 0,
. The phase is normalized to the interval ∈ 0, 1 . The
phase value is then multiplied by a slope and summed with an initial amplitude. The slope
and initial amplitude, which are the polynomial coefficients, used correspond to the segment
in which the phase falls. In the final stage, the value from the adder is converted to form a
complete sinusoid.
Figure 2.1. DDFS using linear interpolation. Source: J. M. P. Langlois and D. AlKhalili, “Novel approach to the design of direct digital frequency synthesizers based on
linear interpolation,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol.
50, no. 9, pp. 567–578, Sept. 2003.
Spectral analysis is used to find the polynomial coefficient values for implementation.
The coefficients depend directly on the maximum SFDR attainable, which has the relation to
the total number of segments
7
16
1
(2.3)
where, as mentioned previously, is the total number of segments. The greater the desired
SFDR results in a greater number of segments required thus increasing the architecture
complexity. Once the coefficients have been calculated, the method of ‘Canonical Signed
Digit Representation’ [19] is used in DDFSs to implement a multiplierless architecture. The
coefficients are chosen (quantized) such that they can be expressed with powers of two
summed together. For example, suppose mi = 1.2337 which quantized to five bits (by
multiplying by 24) is 10011 in binary notation. The product can be represented by the sum
∗
2
2
2
.
(2.4)
It can easily be seen the addends on the right-hand side of 0 are arithmetic right shifts
of the signal.
A representation of the linearly interpolated DDFSin [4] modified to remove all
multipliers is shown in Figure 2.2.
The value of the truncated phase providesthe selection criteria for which inputs to the
multiplexors are passed to the outputs. These inputs consist of the shifted phase values
previously mentioned and inverters to accommodate negative coefficients (slopes). It should
also be noted that the multiplierless technique introduces quantization error. The benefits of
removing the multipliers results in a faster, more efficient architecture and outweighs the
drawback of error.
2.1.2 Second-Order Interpolation
The order of the interpolating polynomial can be increased to design for a higher
SFDR though the complexity of the architecture is also increased due to the use of a squarer.
The output sinusoid in [3] is interpolated by
(2.5)
where
is the parabolic polynomial coefficient and all others the same as defined in
equation (2.2). The SFDR upperbound for a DDFS approximated using this piecewisequadratic method is shown in [3] as
8
Figure 2.2. Single phase multiplierless DDFS architecture.
256
96
24
1.
(2.6)
Using the SFDR calculated using equation (2.6), the DDFS architecture can be
realized in the same manner as section 2.1.1 with the addition of a squarer to accommodate
the quadratic term.
2.1.3 Third-Order Interpolation
Similar to second-order, a third-order interpolating polynomial will result in an even
higher SFDR. The complexity of the architecture is also increased due to the cubed term in
the polynomial. The output sinusoid in [3] is approximated using
(2.7)
9
where
is the third-order polynomial coefficient and all others the same as defined
previously. The SFDR upperbound is then found using
5120
768
3
5
.
(2.8)
As in section 2.1.2, the architecture is then design using the new SFDR, from
equation (2.8), and implemented following [4]. Since the requirement of the cubed phase will
significantly increase the complexity of the architecture, his method should only be used if a
very high SFDR is needed.
2.2 QUASI-LINEAR INTERPOLATION METHOD
In [2], Ashrafi et al. introduce a concept of separating the first quadrant of the
sinusoid into 2 regions: piecewise even parabolic and piecewise linear. This method is coined
the QLIP method because it utilizes a quasi-linear interpolation polynomial. All other
interpolation methods discussed in this chapter considered a sine function. Using parabolic
interpolation requires the use of a squarer so the concept is realized for cosine. For
completeness, the first quadrant of the cosine function is
cos
,0
2
1
(2.9)
0 reduces the complexity of the calculations.
Having the center of the parabola at
Each segment of the first quadrant is approximated with the interpolating polynomial
,
,
where, as previously noted,
segments and
1
1
,
(2.10)
represents the segment number, s is the total number of
is introduced as the segment number that corresponds to the transition from
parabolic to linear interpolation. The authors have found that
3
4 results in the highest
SFDR. The next step is to calculate the upperbound of the SFDR.The SFDR upperbound and
polynomial coefficients are found by using the Fourier series. Because the cosine function is
considered, the even symmetry results in all
coefficients are calculated as
Fourier coefficients being exactly zero. The
10
∞
cos
nπ
2
(2.11)
,
where
(2.12)
nπ
cos
2
2
,
and modifying for segmentation results in
(2.13)
nπ
cos
2
As previously noted in this section,
,
.
and s represent the interpolating polynomial
and the number of segments, respectively. Incorporating allows (2.13) to be rewritten as
2∑
cos
π
2∑
cos
π
(2.14)
The above equation is simplified to
2∑
2∑
(2.15)
where
nπ
cos
2
(2.16)
,0
2.
Equation (2.15) is then transformed to matrix form as
2
,
(2.17)
,
1
(2.18)
where
11
,
1
(2.19)
,
1
(2.20)
,
1
(2.21)
and
∈
∈
1,2,3, … ,2
,
(2.22)
1,
whereN represents the number of odd harmonics used for calculations. Now (2.17) can be
rewritten as
(2.23)
wherethe elements of vector a are the harmonics of the output sinusoid and
|
2
|
|…|
(2.24)
|…|
(2.25)
where
∈
An ideal harmonic vector,
,
′
∈
∈
∈
′
.
(2.26)
,is introduced as
1 0…0 .
(2.27)
The polynomial coefficients, , can be found by solving
′.
From [5], the number of odd harmonic
(2.28)
2 and since
2 , equation (2.28) is
an overdetermined system of linear equations. Seeing that no solution exists, a technique
known as the Chebyshev minimax approximation is used to numerically solve the system of
equations. This is done by minimizing the infinity norm of the error vector
12
‖
′
‖∞
∞
.
(2.29)
In order to find the solution, the following relationship is defined
′
(2.30)
where is the real, positive, upperbound of the error vector. The system in (2.28),constrained
by (2.30), is solved for using linear programming as
,
0 1,
∈
(2.31)
.
where
∈
,
∈
,
∈
,
,
,
′
.
(2.32)
The linear programming algorithm returns the vector
|
(2.33)
where represents the Chebychev minimax solution to equation (2.28) whose minimax value
is , or the minimum value in equation (2.29). Using this minimax value, the upperbound of
the SFDR is
1
.
(2.34)
The coefficients in the Chebychev minimax solution are then quantized to reduce the
complexity of calculations in the system and then optimized via a non-linear algorithm.Once
the coefficients are returned the design is implemented via multiplexer implementation
shown in Figure 2.3 [2].
The difference between this design and that of [4] is the addition of two blocks. The
first is the pre-truncated squarer (truncated by λ bits) which provides the squared phase value
for calculation of the parabolic region. Secondly, a logic block is used to pass the value of
either the phase or squared phase to the adder. The 24-bit accumulator is truncated to W bits,
13
Figure 2.3. Quasi-linear interpolation design. Source: A. Ashrafi, R. Adhami, and A.
Milenkovic, “A direct digital frequency synthesizer based on the Quasi-linear
interpolation method,” IEEE Trans. Circuit Syst. I, Reg. Papers, vol. 57, no. 4, pp. 863–
872, Apr. 2010.
14
the two MSBs of which are used for format conversion.The inputs to the multiplexers
represent the quantized polynomial coefficients as
,
2
(2.35)
which is the Canonic Signed Digit [19] notation.The coefficients are then summed together
and converted from single quadrant values to represent all four quadrant amplitudes.
2.3 NON-UNIFORM PIECEWISE-LINEAR INTERPOLATION
In [6], De Caro et al. use the method of linear interpolation to evaluate the amplitude
of the output sinusoid, used by Langlois above. The new idea conveyed in this paper is a
technique to reduce the total number of segments thus reducing the number of coefficients
that need to be stored. This results in less multiplexors and, inherently, less shift and sum
operations. The overall outcome is a smaller, more efficient architecture.
The authors develop a segmentation scheme that is based upon uniform segmentation.
They suggest removing the boundary between adjacent segments to form a macro-segment of
2 . Combining this newly formed segment with the next adjacent segment
length
results in
3 length. This method can be continued to form macro-segments of any
integer multiple of the uniform segmentation length. The authors implement a linear
programming algorithm to calculate whether adjacent segments are continuous at the
boundary. If so, the two segments are joined therefore reducing the total number of segments
which reduces the complexity of the architecture.
An optimal example of this non-uniform segmentation is shown in Figure 2.4. The
32 initial uniform segments are reduced to the non-uniform number of segments
17 without degradation of the SFDR. The optimal design does not always lend itself to the
most efficient architecture because logic is needed to map the phase to the segment number.
To further simplify the architecture, the authors devised 3 schemes which result in
efficient circuit implementations without introducing a significant amount of error. Each
scheme is separated in to 3 regions depending on the segment length. The schemes evolved
this way because the approximation error of a linearly interpolated sinusoid is proportional to
the second derivative of the sine function [6]. For the first quadrant, using equation (2.1),
15
Figure 2.4. Non-uniform segmentation. 32
uniform segments reduced to 17.
1
2
0
1
(2.36)
and therefore the non-uniform segment lengths chosen are
,0
2 ,
4 ,
where
and
1,
are the boundaries between regions and, as previously noted,
(2.37)
represents the
truncated phase.
The next challenge is to find where the boundaries between regions should be. It is
shown that, just like the number of uniform segments, the boundaries should be chosensuch
that
2 ,
2 ,
1,2,3 …
1,2,3 …
(2.38)
which is another result of the ‘Canonical Signed Digit Form’ [19]. The authors have found
that having the first boundary at
0.25 is the best. The second boundary varies in the
three schemes. All three designschemes can be seen in Figure 2.5 [6].
Note the segment lengths beneath the region number. The resulting multiplierless
architecture from this interpolation method is shown in Figure 2.6 [6].
Just as in the previous two methods, the phase values are shifted and summed. The
authors also add the value ‘KR’ in order to truncate the 2 LSBs shown in the ‘T2’ block of
Figure 2.6 [6].
16
Figure 2.5. Non-uniform segmentation. Source: D. De Caro, N. Petra,
and A. G. M. Strollo, "Direct Digital Frequency Synthesizer Using
Nonuniform Piecewise-Linear Approximation," IEEE Trans. Circuit
Syst. I, Reg. Papers, vol. 58, no. 10, pp.2409-2419, Oct. 2011.
The method in [6] has proved to be the most efficient design of the current
polynomial interpolated DDFS architectures. Just as any other technologies, there are still
improvements that can be made.
17
Figure 2.6. Multiplierless architecture employing non-uniform interpolation. Source: D.
De Caro, N. Petra, and A. G. M. Strollo, "Direct Digital Frequency Synthesizer Using
Nonuniform Piecewise-Linear Approximation," IEEE Trans. Circuit Syst. I, Reg.
Papers, vol. 58, no. 10, pp.2409-2419, Oct. 2011.
18
CHAPTER 3
NON-UNIFORM QUASI-LINEAR DESIGN
The design introduced in this thesis investigates a new approach to polynomial
interpolation. The new development is using non-uniform segmentation for quasi-linear
interpolation. The previous method [2], interpolated over uniform segmentation whilethe
design in this thesis is first and second order interpolations over the non-uniformly
segmented phase with 0, , , , , 1 as segment boundaries. The framework for
segmentation starts with 8 uniform segments (s=8) and merges the first 4 segments
(corresponding to phase 0
) into a macro segment. For this macro segment (parabolic
region), the cosine amplitude is interpolated with a second order polynomial. Linear
interpolation is used for the remaining segments (corresponding to phase
0),which
make up the linear region. The interpolation configuration can be seen in Figure 3.1.
Figure 3.1. Quasi-linear non-uniform segmentation.
3.1 SFDR CALCULATION AND COEFFICIENT
GENERATION
As stated above, the first step to designing a DDFS architecture is to calculate the
theoretical upperbound of the SFDR. In order to create the matrix used in (2.28), a modified
version of the interpolating polynomial in equation (2.10) is used
,0
,0
0.5
0.5,
2,3,4,5.
The odd harmonic coefficients of the Fourier series are integrated as
(3.1)
19
2
cos
1,3, … 2
1,
π
2∑
cos
π
32.
,
(3.2)
Simplifying the above equation for matrix multiplication yields
2∑
2
(3.3)
where
(3.4)
nπ
cos
2
,0
2
,
2.
Just as in [2],
(3.5)
where the matrices are composed as follows
,
,
1
,
,
where
1
1
(3.6)
5
(3.7)
1
(3.8)
5
(3.9)
is defined in equation (2.16). Just as in equations (2.23)-(2.26), matrix
multiplication results in
(3.10)
where
|
2
|
|…|
|…|
(3.11)
(3.12)
20
and where
∈
,
∈
∈
.
(3.13)
The number of odd harmonics N used for calculations is 32, resulting in 11 equal
minimax error values. To achieve maximum SFDR, the infinity norm of the error
vector,‖
‖∞
′
∞
,is minimized using convex optimization. The infinity norm of
a vector is defined as the maximum element of a given vector
1 | 2 |, … , |
‖ ‖∞
.
(3.14)
This step minimizes the maximum element in the error vector. To use this
optimization method, the convex optimization toolbox CVX, developed by Dr. Stephen Boyd
at Stanford University, was installed to run with MATLAB. The system is then constrained
by
‖
‖∞
α,
(3.15)
where the variable α is introduced to be minimized by the algorithm in the CVX tool.
The algorithm returns the optimized values of
as well as α, which is not used unless
comparing results with linear programming methods. Recall from above that
∈
is
the one dimensional, augmented matrix of coefficients
|
|
|
|
′
.
(3.16)
The resultant coefficients can be seen in Table 3.1. Quantizing the coefficients causes
the SFDR to drop below the upperbound of 68.4 dBc previously calculated.
To succeed in implementing an efficient design, more optimization is necessary.
Convex optimization is no longer a valid technique due the quantized system’s now nonlinear behavior.
An additional step of optimizing with the Nelder-Mead Optimization Algorithm has
been taken to reduce the quantization error from an ideal sinusoid and thus increasing the
final SFDR.This method minimizes a given variable with respect to a number of additional
variables. To accommodate this format, the SFDR is negated while the rest of the variables
are adjusted to ‘minimize’ the negated SFDR. The input variables to the algorithm are the
21
Table 3.1. Coefficients Calculated by Convex Optimization
Segment (k)
16
2
1
0.9982
1022
-1.1826
10010
2
1.3287
1360
-1.2337
10011
3
1.4130
1446
-1.3713
10101
4
1.5135
1549
-1.5062
11000
5
1.5781
1616
-1.5789
11001
initial amplitudes and linear and parabolic coefficients, as well as a translation parameter,
which is used to reduce the discontinuity caused by quantizing the coefficients. The
difference can be seen in Figure 3.2. The left-hand graph shows a continuous sinusoid using
the translation while on the right, a discontinuity can be seen at x = 1024 which corresponds
to as the phase.
Figure 3.2. Half wave sinusoid with translation (left) and without
(right).
It is worth noting, which can be seen comparing both plots, that this optimization
reduces the peak-to-peak amplitudeslightly. This reduction resulted in an increased SFDR.
Without the translation, the maximum attainable SFDR ~55dBc while including this variable
22
resulted in an SFDR = 61.075dBc. The new coefficients are shown in Table 3.2. The
59 for this architecture.
translation parameter was calculated as
Table 3.2. Implementation Coefficients
2
16
1
1012
10010
2
1338
10101
3
1376
10100
4
1563
11000
5
1628
11001
Segment (k)
3.2 DESIGN MODELING
Using the coefficients calculated in Table 3.2, the system was modeled in fixed-point
architecture using Simulink which utilizes a MATLAB function block, the underlying code
of which is shown in Appendix A. The multiplierless block diagram of the complete DDFS
system is shown in Figure 3.3. The output obtained from the Simulink model is shown in
Figure 3.4.
As can be seen in Figure 3.4, the overflowing accumulator produces the necessary
triangle signal (bottom). This feeds the PSM, the output of which is seen in Figure 3.4
(middle). This also shows the function of the format converter. For the cosine signal in
Figure 3.4, the format converter raises the amplitude by 1024 for phases in the first and
fourth quadrant while it inverts the signal in the second and third quadrants.
3.3 SYNTHESIS AND ANALYSIS
The model was also designed in hardware description language (HDL) . The resulting
Verilog code, shown in Appendix B, consists of three modules: the accumulator, the phase
to sine mapper and the DDFS top-level module which instantiates the previous two. The
accumulator is designed in sequential (clocked) logic while the PSM and DDFS are
23
Figure 3.3. Non-uniform quasi linear interpolated DDFS schematic.
combinational, meaning the output is changed in the same clock cycle as the input. Using
Xilinx ISE Design Suite 13.4, the Verilog code implementing the DDFS system was then
synthesized and simulated. The output of this HDL simulation was compared to that of fixedpoint model implemented in Simulink, resulting in a difference of ±2 bits out of the
maximum output of 2048.
The design was also synthesized in ASIC using the Cadence BuildGates and SoC
Encounter tools with a TSMC-0.13
process and a 1.2V supply standard cell library. This
24
Figure 3.4. System output (top); PSM output (middle);
overflowing phase accumulator output (bottom).
step is to obtain the power and speed specifications of the design. As noted in [2], it is very
difficult to compare multiple architectures due to differences in SFDR choices, wordlengths
in the accumulator and subsequent calculations and standard cell libraries. Since it is difficult
to analyze the performance of a number of DDFSs simultaneously, a common method of
comparison in literature [2] is to calculate the normalized area. This is done by dividing the
chip’s total silicon core area by the size of the feature squared. For this design, the feature
squared is 0.0169. A comparison between the designs discussed in this thesis is shown in
Table 3.3 [2, 4, 6].
The area of the design was realized as 8109
, which resulted in a maximum clock
frequency of 174 MHz and a power consumption of 11.23
/
. The design of this
thesis is currently faster than [4] though is it not faster or more efficient than the designs in
[6] or [2]. This result can be attributed to the large multiplier in our design. The technique in
[6] does not utilize any multipliers while the method in [2] uses a pre-truncated squarer to
reduce the system cost of squaring the phase.Additionally, the truncated accumulator
wordlength in our design was one bit greater than that of [6] and [2]. This in conjunction with
the output resolution of our design being greater than the resolution of [6] and [2] by one and
25
Table 3.3. Design Comparison
Design
4*
2¥
6‡
Architecture
Uniform
Uniform Quasi-
Non-uniform Non-uniform
Linear
Linear (QLIP)
Linear
Quasi-Linear
14
11
11
12
Output resolution (bit)
12
10
9
11
Number of segments (s)
32
4
9
5
SFDR(dBc)
84.2
63.2
62
61.075
Process
0.35
0.13
0.13
0.13
CMOS
CMOS
CMOS
CMOS
100
313
335
174
79.9
4.9
3.16
11.23
79000
3756
3580
8109
6.45
2.22
2.12
4.79
Wordlength of Truncated
Novel Design
Accumulator (bit)
Max. Freq(
Power(
/
Area (
)
Normalized Area
)
)
10
* J. M. P. Langlois and D. Al-Khalili, “Novel approach to the design of direct digital frequency synthesizers
based on linear interpolation,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 50, no. 9, pp.
567–578, Sept. 2003.
¥ A. Ashrafi, R. Adhami, and A. Milenkovic, “A direct digital frequency synthesizer based on the Quasi-linear
interpolation method,” IEEE Trans. Circuit Syst. I, Reg. Papers, vol. 57, no. 4, pp. 863–872, Apr. 2010.
‡ D. De Caro, N. Petra, and A. G. M. Strollo, "Direct Digital Frequency Synthesizer Using Nonuniform
Piecewise-Linear Approximation," IEEE Trans. Circuit Syst. I, Reg. Papers, vol. 58, no. 10, pp.2409-2419, Oct.
2011.
two bits, respectively, resulted in more complex circuitry and slightly lower performance.
This is not to say that our design could not be a better solution with more work.
26
CHAPTER 4
CONCLUSION AND FUTURE WORK
This thesis has been focused on Direct Digital Frequency Synthesizers realized by
method of interpolation. A novel approach was introduced: interpolating via non-uniform
segmentation and estimating the amplitude of the output sinusoid with a quasi-linear
polynomial. A thorough understanding of polynomial interpolation is necessary for
calculation of the interpolating coefficients, which mathematically define the complete
system.
The first step in creating this architecture was to calculate the upperbound of the
SFDR by analyzing the harmonics of the best case output given the number of bits used for
calculations. The governing equations were derived from existing methods and modified to
fit the principle of this design. Upon successful completion, the system was modeled and the
output verified. Due to the quantization of the coefficients, a tolerance within 10% of the
upperbound of the SFDR was invoked. Finally, after confirmation of the modeled SFDR, the
system was realized on an FPGA and an ASIC chip to justify the design. A readoutwas taken
of the power consumption of the ASIC chip and the area necessary to realize such a system.
The system performance was compared against similar state-of-the-art interpolated DDFS
designs.The topics for further investigation include additional optimization parameters used
in the design and modification of our circuitry to see if the performance can be improved
upon.
The first optimization technique is to incorporate the wordlengths of the truncated
phase and system output in the algorithm. Minimizing these values while still returning an
acceptable SFDR will reduce the complexity of the architecture.
In addition to optimization, changing the layout of the architecture may result in more
a favorable system performance. Experiments should be undertaken which choose different
non-uniform segmentation schemes. The first segment of our design, which was half of the
first quadrant, could be bisected into two segments for example.
27
On the topic of non-uniform segmentation, one could evaluate the effect of designing
a DDFS architecture starting from a greater number of initial segments. Our technique may
prove to be more valuable when designing for a higher SFDR which requires the use of more
segments.
28
REFERENCES
[1]
J. Tierney, C. M. Rader, and B. Gold, “A digital frequency synthesizer,” IEEE Trans.
Audio Electroacoust., vol. 19, pp. 48–57, Mar. 1971.
[2]
A. Ashrafi, R. Adhami, and A. Milenkovic, “A direct digital frequency synthesizer
based on the Quasi-linear interpolation method,” IEEE Trans. Circuit Syst. I, Reg.
Papers, vol. 57, no. 4, pp. 863–872, Apr. 2010.
[3]
D. De Caro and A. G. M. Strollo, “High-performance direct digital frequency
synthesizers using piecewise-polynomial approximation,” IEEE Trans. Circuit Syst. I,
Reg. Papers, vol. 52, pp. 324–336, Feb. 2005.
[4]
J. M. P. Langlois and D. Al-Khalili, “Novel approach to the design of direct digital
frequency synthesizers based on linear interpolation,” IEEE Trans. Circuits Syst. II,
Analog Digit. Signal Process., vol. 50, no. 9, pp. 567–578, Sept. 2003.
[5]
A. Ashrafi and R. Adhami, “Theoretical upperbound of the spurious free dynamic
range in direct digital frequency synthesizers realized by polynomial interpolation
methods,” IEEE Trans. Circuit Syst. I, Reg. Papers, vol. 54, no. 10, pp. 2252–2261,
Oct. 2007.
[6]
D. De Caro, N. Petra, and A. G. M. Strollo, "Direct Digital Frequency Synthesizer
Using Nonuniform Piecewise-Linear Approximation," IEEE Trans. Circuit Syst. I, Reg.
Papers, vol. 58, no. 10, pp.2409-2419, Oct. 2011.
[7]
H. T. Nicholas III and H. Samueli, “An analysis of the output spectrum of direct digital
frequency synthesizers in the presence of phase-accumulator truncation,” in Proc. 41st
Annu. Frequency Control Symp., pp. 495–502, 1987.
[8]
G. C. Gielis, R. van de Plassche, and J. van Valburg, “A 540-MHz 10-b polar-tocartesian converter,” IEEE J. Solid-State Circuits, vol. 26, no. 11, pp. 1645–1650, Nov.
1991.
[9]
A. Madisetti, A. Y. Kwentus, and A. N. Willson, “A 100-MHz, 16-b, direct digital
frequency synthesizer with a 100-dBc spurious-free dynamic range,” IEEE J. SolidState Circuits, vol. 34, no. 8, pp.1034–1043, Aug. 1999.
[10] J. M. P. Langlois and D. Al-Khalili, “Phase to sinusoid amplitude conversion
techniques for direct digital frequency synthesis,” Proc. IEE Circuit Devices Syst., vol.
151, no. 6, pp. 519–528, Dec. 2004.
[11] F. Curticãpean and J. Niittylahti, “Exact analysis of spurious signals indirect digital
frequency synthesizers due to phase truncation,” Electron. Lett., vol. 39, no. 6, pp. 499–
501, Mar. 2003.
[12] F. Harris, "Ultra Low Phase Noise DSP Oscillator [DSP Tips & Tricks]," Signal
Process. Magazine IEEE , vol. 24, no. 4, pp.121-124, July 2007.
29
[13] A. M. Sodagar and G. R. Lahiji, “A pipelined ROM-less architecturefor sine-output
direct digital frequency synthesizers using the secondorderparabolic approximation,”
IEEE Trans. Circuit Syst. II: AnalogDigit. Signal Process., vol. 48, no. 9, pp. 850–857,
Sep. 2001.
[14] A. M. Sodagar and G. R. Lahiji, “A pipelined ROM-less architecturefor sine-output
direct digital frequency synthesizers using the secondorderparabolic approximation,”
IEEE Trans. Circuit Syst. II: AnalogDigit. Signal Process., vol. 48, no. 9, pp. 850–857,
Sept. 2001.
[15] J. Vallis, T. Sansaloni, A. Perez-Pascual, V. Torres, andV. Almenar, “The use of
CORDIC in software defined radios: A tutorial,” IEEE Comm. Magazine, vol.44, no. 9,
pp. 46-50, Sept. 2006.
[16] C. Dick, F. Harris, and M. Rice, “Synchronizationin Software Defined Radios – Carrier
and Timing RecoveryUsing FPGAs,” IEEE Symposium On Field-Programmable
Custom Computing Machines, vol. 46 , pp. 195-204, Apr. 2000.
[17] D. R. Jahagirdar, "A high dynamic range miniature DDS-based FMCW radar," IEEE
Radar Conference (RADAR), vol. 10, pp.0870-0873, 7-11 May 2012
[18] Y. T. Im, J. H. Lee, and S. O. Park, "A DDS and PLL-based X-band FMCW radar
system," IEEE MTT-S International Microwave Workshop Series on Intelligent Radio
for Future Personal Terminals (IMWS-IRFPT), vol. 59, pp.1-2, Aug. 2011.
[19] G. W. Reitweisner, “Binary arithmetic,” Adv. Comput., vol. 1, pp. 232–313, Sept. 1960.
30
APPENDIX A
MATLAB DDFS FUNCTION
31
function [y_out] = DDFS(u)
u1=bitget(u,24);
u2=bitget(u,23);
u3=bitsliceget(u,22,13);
u4=bitcmp(bitxor(u1,u2));
z=u2*(2^10-1);
u2b=fi(z,0,10,0);
u5=bitxor(u3,u2b);
u6=u5*u5;
u6=bitsliceget(u6,20,11);
u5bar=bitcmp(u5,10);
u6bar=bitcmp(u6,10);
Yk=[1012 1338 1376 1563 1628];
mod=59;
if u5<2^9
value1=fi(Yk(1),0,11,0)+u6bar+bitcmp(u6/2^3);
seg_out=bitsliceget(value1,10,1);
elseif u5>=4/8*2^10 && u5<5/8*2^10
value2=u5bar+bitcmp(u5/2^2)+(u5/2^4)+fi(Yk(2),0,11,0);
seg_out=bitsliceget(value2,10,1);
elseif u5>=5/8*2^10 && u5<6/8*2^10
value3=fi(Yk(3),0,11,0)+u5bar+bitcmp(u5/2^3)+bitcmp(u5/2^3);
seg_out=bitsliceget(value3,10,1);
elseif u5>=6/8*2^10 && u5<7/8*2^10
value4=fi(Yk(4),0,11,0)+u5bar+bitcmp(u5/2^1);
seg_out=bitsliceget(value4,10,1);
elseif u5>=7/8*2^10 && u5<2^10
value5=fi(Yk(5),0,11,0)+u5bar+bitcmp(u5/2^1)+bitcmp(u5/2^4);
seg_out=bitsliceget(value5,10,1);
else
seg_out=fi(0,0,10,0);
end
if u4==1
y_out=seg_out+fi(2^10-mod-1,0,10,0);
else
y_out=fi(bitcmp(seg_out),0,11,0);
end
32
APPENDIX B
VERILOG HDL IMPLEMENTATION
33
module nl_qlip5b (in,clk,reset,out,outsum);
parameter n=24; // Input wordlength
parameter m=12; // Truncated Phase wordlength
input [n-1:0] in;
inputclk,reset;
outputreg [m-2:0] out;
output [m-2:0] outsum;
wire msb1,msb2;
wire [m-1:0] trunc;
wire [m:0] add_out;
wire [m-3:0] w;
wire [2*(m-2)-1:0] wtemp;
wire [m-3:0] wsq,w_1,w_2,w_3,w_4;
reg [m-3:0] mux1,mux2,mux3;
reg [m-2:0] mux4;
wire [m:0] outtemp;
wire [2:0] sel;
wire [m-2:0] carry1,sum1,carry,sum;
accu #(n,m) U1 (in, trunc, reset, clk);
assign msb1=trunc[m-1];
assign msb2=trunc[m-2];
assign w={m-2{msb2}} ^ trunc[m-3:0]; // 1's Complementor
assignsel=w[m-3:m-5];
assignwtemp=w*w;
assignwsq=wtemp[2*(m-2)-1:2*(m-2)-10];
assign w_1=(w>>1);
assign w_2=(w>>2);
assign w_3=(w>>3);
assign w_4=(w>>4);
// MUX Tree
always @ *
begin
mux1=10'd0;
case (sel[2])
1'b0 : mux1=~wsq;
default : mux1=~w;
endcase
end
34
always @ *
begin
mux2=10'd0;
case (sel)
3'b100 : mux2=~w_2;
3'b101 : mux2=~w_3;
3'b110 : mux2=~w_1;
3'b111 : mux2=~w_1;
default : mux2=~(wsq>>3);
endcase
end
always @ *
begin
mux3=10'd0;
case (sel)
3'b100 : mux3=(w>>4);
3'b101 : mux3=~w_3;
3'b110 : mux3=10'b0;
3'b111 : mux3=~w_4;
default : mux3=10'd0;
endcase
end
always @ (sel)
begin
mux4=10'd0;
case (sel)
3'b100 : mux4=11'd1338;
3'b101 : mux4=11'd1376;
3'b110 : mux4=11'd1563;
3'b111 : mux4=11'd1628;
default : mux4=11'd1012;
endcase
end
DW01_csa #(11) U8 ({1'b0,mux1},{1'b0,mux2},{1'b0,mux3},1'b0,carry1,sum1,co1);
DW01_csa #(11) U9 (carry1,sum1,mux4,1'b0,carry,sum,co2);
assignadd_out={co1,carry}+{co2,sum};
assignouttemp=mux1+mux2+mux3+mux4;
assignoutsum=outtemp[m-2:0];
always @ (add_out or msb1 or msb2) begin
if (msb1 ^ msb2)
35
out=~{1'b0,add_out[9:0]};
else
out=add_out[9:0] + 10'd964;
end
endmodule
//----------------------------------------------------------------------------//
//
This confidential and proprietary software may be used only
// as authorized by a licensing agreement from Synopsys Inc.
// In the event of publication, the following notice is applicable:
//
//
(C) COPYRIGHT 1992 - 2001 SYNOPSYS INC.
//
ALL RIGHTS RESERVED
//
//
The entire notice above must be reproduced on all authorized
// copies.
//
// AUTHOR: SS
//
// VERSION: Simulation Architecture
//
//----------------------------------------------------------------------------module DW01_csa (a,b,c,ci,carry,sum,co);
parameter width=4;
// port decalrations
input [width-1 : 0] a,b,c;
input
ci;
output [width-1 : 0] carry,sum;
output
co;
reg [width-1 : 0] carry,sum;
reg
co;
integeri;
always @(a or b or c or ci)
begin
carry[0] = c[0];
carry[1] = (a[0]&b[0])|((a[0]^b[0])&ci);
sum[0] = a[0]^b[0]^ci;
for (i = 1; i<= width-2; i = i + 1) begin
carry[i+1] = (a[i]&b[i])|((a[i]^b[i])&c[i]);
36
sum[i] = a[i]^b[i]^c[i];
end // loop
sum[width-1] = a[width-1]^b[width-1]^c[width-1];
co = (a[width-1]&b[width-1])|((a[width-1]^b[width-1])&c[width-1]);
end // process
endmodule // DW01_csa;
moduleaccu(in,acc_out,rst,clk);
parameter n=24; // Register Length
parameter m=11; // output length
inputclk,rst;
input [n-1:0] in;
output [m-1:0] acc_out;
reg [n-1:0] acc;
always @(posedgeclk)
begin
if (rst) acc<=0;
else acc<=acc+in;
end
assignacc_out=acc[n-1:n-m];
endmodule

Download Report

NON-UNIFORM QUASI-LINEAR INTERPOLATION FOR DIRECT

Paperzz.com

Your Paperzz