design and implementation of fft architecture for real

International Journal of Research in Engineering and Applied Sciences
DESIGN AND IMPLEMENTATION OF FFT
ARCHITECTURE FOR REAL-VALUED SIGNALS
BASED ON RADIX-23 ALGORITHM
1
Pradnya Zode, 2A.Y. Deshmukh and 3Abhilesh S. Thor
1,3
Assistnant Professor, Yeshwantrao Chavan College of Engineering
Professor, Department of Electronics Engg. , G.H. Raisoni College of Engineering, Nagpur
(E-mail:1 [email protected], [email protected], [email protected]).
2
Abstract- A new FFT architecture for real-valued signal is proposed using Radix-23 algorithm. It is based on modifying
flow graph of the FFT algorithm such that it has both real and complex datapaths. A redundant operation in flow graph is
replaced by imaginary part. Using folding technique RFFT architecture with any level of parallelism can be achieved. This
RFFT architecture will lead to low hardware complexity as compare to radix-2 and radix 22 algorithm in terms of adder,
multiplier and delay. N-point 2 parallel radix-23 architecture requires (log8N-1) complex multiplier,2log2N adders, 3N/2-2
delays. RFFT which is used for real time applications and in portable devices for which low power consumption is main
requirement, so accordingly carry propagate adder which has least power consumption and CSD multiplier is selected for our
proposed architecture.
Keywords— FFT, Parallel Processing, Pipelining, Real Signals, radix-23 , Folding
1.
The paper is organized as follows. Section II describes
previous work related to RFFT. Section III describes
proposed architecture for 16 point RFFT radix-23
DIF. Section VI describes FPGA Implementation of adder
and CSD multiplier. The Experimental results are discussed
in section V and finally, concluding remarks are in section
VI.
Introduction
Fast Fourier transform (FFT) is one of the widely used
algorithms in digital signal processing [1]. Now a day’s
interest in the computation of FFT for real valued signals
(RFFT) is increased since most of the physical signals are
real. RFFT is very important algorithm used in various real
time applications. In the area of digital signal processing
(DSP) [2]. FFT is very important algorithm. Hardware
complexities can be reduce in asymmetric digital subscriber
line (ADSL) [3] by using RFFT. In applications like
spectral and filtering analysis [4] FFT also plays an
important role which helps to analyze spectral components.
RFFT is very vital algorithm for analyzing signals like
electroencephalography (EEG) and electrocardiography
(ECG). RFFT is also used in various portable devices which
leads to low power consumption.
FFT is also used in power spectral density which can detect
whether signal is perfect or there is any problem. This paper
tells about designing RFFT architecture. In this flow graph
is modified and redundant part is replaced by imaginary
part in order to reduce complexity. Using folding technique
RFFT architecture with any level of parallelism can be
achieved. As imaginary part is injected in butterfly structure
it will have both real and complex datapath. RFFT is used
for real time applications and in portable devices for which
low power consumption is main requirement, so
accordingly carry propagate adder which has least power
consumption and CSD multiplier is selected for our
architecture.
ISSN (Print): 2249-9210 | ISSN (Online): 2348-1862
2.
Previous Work
Previously, in the past various algorithms for computation
of RFFT is presented but they did not have proper regular
geometry. This is very important for deriving pipelined
architecture. Firstly pipelined architecture for real valued
signal was designed in [5]. But it is restricted to only radix2 and 4 parallel RFFT architecture. Also, it has only real
datapaths. To derive FFT architecture for real and complex
inputs a design is presented in [6].But in this architecture
even after removing redundant operations, it still calculate
this samples. Again there was no full hardware utilization
of architecture derived in this paper.
In [7] pipelined RFFT architecture is derived but it has
more hardware complexity than our proposed architecture
as it has more number of adders, multipliers and delays than
radix-23 algorithm in [8].
71
© IJREAS, Vol. 02, Issue 02, July 2014
International Journal of Research in Engineering and Applied Sciences
3.
Proposed Work
The N-point discrete Fourier transform (DFT) of a sequence
x[n]is defined as
[ ]=
[ ]
Fig. 2 Butterfly structure I for proposed architecture [8]
Where
( / )
=
In RFFT inputs are real, If x[n] is real, then output X[k]
have conjugate symmetric
X [N-k] = X*[k]
(1)
Due to this, (N/2)-1 output calculations can be removed as
they are redundant .Proposed work involves following 2
steps.
A.
Modified butterfly structure
Fig. 3 Butterfly structure II for proposed architecture [8]
Redundant samples can be find by approach in [5]. After
finding redundant samples these are removed. But it leads
to irregular geometry, so efficient pipelining cannot be
done. In order to get regular geometry redundant operations
are replaced by imaginary part. Now efficient pipelining
can be done. Modified flow graph of 16 point RFFT DIF
radix-23 shown in Fig.1
B.
Folding.
Using folding technique in [9] pipelined architecture can be
derived from DFG. Also it leads to optimized datapath [10].
Nodes which are in DFG can be implemented by butterfly
structure I and II. Proposed work 2 parallel architecture
shown below in Fig. 4
Fig. 4. Proposed work
4.
FPGA IMPLEMENTATION
In this various adders are simulated on Spartan 6 , Xc6sl16
device, CSG324 package and adder with least power power
consumption is selected and also CSD multiplier is
simulated on Spartan-6.
3
II.
Fig.1 Modified flow graph 16 point RFFT radix-2 DIF
Adder
UNITS
RFFT is generally used for for real time applications and
portable devices like ECG and EEG etc. Low power and
low area is main requirement for such devices so, we have
select adder accordingly. So various adders like carry
propagate as in [11], carry skip as in [12] and carry look
ahead adder as in [13] is simulated on Spartan 6, Xc6sl16
device, CSG324 package. Table I shows their performance
based on area, power and delay. Amongst these carry
propagate adder suits best as it has least power
consumption. Also its RTL view shown in Fig.5 and
waveform in Fig.6.
Since flow graph has both complex and real parts, it has
both complex and real datapath. So to handle this we have
two butterfly structures. First is very straight it involves two
real inputs and consists of real adder and subtraction.
This butterfly structure is shown in Fig.2butterfly structure
is shown in Fig.2
ISSN (Print): 2249-9210 | ISSN (Online): 2348-1862
72
© IJREAS, Vol. 02, Issue 02, July 2014
International Journal of Research in Engineering and Applied Sciences
multiplier has multiplexer which is controlled by these pair
of bits. Depending on these input pair, multiplexer in
multiplier output will be input data, inverse of input data or
all zeros. Depending on these pair of bits shifts are applied.
It also has circuitry which receives and combines these
outputs and further shifts of bits are applied to generate
final multiplier output. Normal multiplier and CSD[16]
multiplier(8 bit x 8 bit)is simulated on Spartan 6, Xc6sl16
device, CSG324 package and their performance based on
area ,power and delay is shown in Table II.
TABLE I
Adder(16bit)
Power(mw)
Delay(ns)
Area(slices)
Carry
propagate
30
15.243
25 / 9112
Carry look
ahead
42
13.567
59
Carry skip
37
14.767
44
TABLE II
Power(mw) Delay(ns)
Normal
multiplier
CSD
multiplier
Area(slice)
35
11.7
116 / 9112
32
10.5
88
Clearly it is seen that CSD multiplier has less delay and
power consumption, so it is selected for our proposed
architecture. Its waveform shown in Fig.7 and RTL view in
Fig.8.
Fig. 5. RTL View of carry propagate adder
Figure 7 Output waveform of CSD multiplier
Figure 6 Output Waveform of carry propagate adder
B.
Multiplier
In normal multiplier, multiplication is done by shifts
producing partial products and then adding all the partial
products. So multiplication of two ‘N’ bits will generate N
x N partial products and subsequently (N-1) adders will
require if two inputs ‘N’ bit adders are used. So number of
hardware component increases and also time required for
multiplication increases. So there is need of multiplier
which generates less partial products which in turn reduces
time as well as power consumption for multiplication.CSD
multiplier suits best to our requirement.
Proposed CSD multiplier is very efficient way of
multiplication; it leads to reduction in number of partial
products by using redundancy of sign code CSD multiplier
in [14] and provides an efficient way of multiplication as in
[15]. Number of partial products also gets reduced. As
number of partial products are reduced, it is more hardware
efficient, so it require less time for multiplication and low
power consumption than normal multiplier. Constant value
with which multiplicand is to be multiplied has pair of bits,
ISSN (Print): 2249-9210 | ISSN (Online): 2348-1862
Fig. 8. RTL view of CSD multiplier
73
© IJREAS, Vol. 02, Issue 02, July 2014
International Journal of Research in Engineering and Applied Sciences
will have low hardware as compare to other previous
designs.
C. Simulation result of FFT architecture
TABLE III
CONSIDER N=64
Fig.9 shows inputs to FFT and Fig.10 shows outputs of
FFT.
Radix-2
[8]
Complex
multiplier
2(log4 N 1)
4 C.M
Adders
Delays
4log2 N
24 adders
2N
128 delays
Radix-22
[8]
(log4 N -1)
2 C.M
4log2 N 2
22 adders
2(N - 2)
124 delays
Proposed
Radix 23
(2 parallel)
(log8 N – 1)
1 C.M
2log2 N
12 adders
3N/2 – 2
94 delays
6. Conclusion
Efficient architecture for computation of RFFT has been
proposed in this paper. Datapaths will be optimized by
folding. It will also have less number of adders, multiplier
and delay with respect to previous architecture. This
architecture will have low power with respect to adder as
carry propagate adder will be used and relatively fast as
CSD multiplier will be used.
Fig.9 Inputs to FFT
References
[1]
[2]
[3]
[4]
[5]
[6]
Fig.10 Outputs of FFT
5.
Comparison and Analysis
[7]
Table III compares the hardware complexity in terms of
adder, multiplier and delays of proposed architectures with
previous architecture for N-point FFT and as an example
for N=64.It has been observe that number of adders,
multiplier and delays for radix-23 architecture will be less
as compared to other architectures, so radix-23 architecture
ISSN (Print): 2249-9210 | ISSN (Online): 2348-1862
[8]
74
A. V. Oppenheim, R.W. Schafer, and J.R. Buck, DiscreteTime Signal Processing, 2nd ed. Englewood Cliffs, NJ,
USA: Prentice Hall, 1998.
H. Chi and Z. Lai, “A cost-effective memory-based realvalued FFT and Hermitian symmetric IFFT processor for
DMT-based wire-line transmission systems,” in Proc.
ISCAS, May 2005, vol. 6, pp. 6006–6009.
W. Ko, J. Kim, Y. Park, T. Koh, and D. Youn, “An
efficient DMT modem for the G.LITE ADSL transceiver,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.
11, no. 6, pp. 997–1005, Dec. 2003.
S. He and M. Torkelson, “Designing pipeline FFT
processor for OFDM (de)modulation,” in Proc. Int. Symp.
Signals Syst., Oct. 1998. pp. 257–262.
M. Garrido, K. K. Parhi, and J. Grajal, “A pipelined FFT
architecture for real-valued signals,” IEEE Trans. Circuits
Syst. I, Reg Papers, vol.56, no. 12, pp. 2634–2643, Dec.
2009.
M. Ayinala, M. Brown, and K. K. Parhi, “Pipelined
parallel FFT architectures via folding transformation,”
IEEE Trans. Very Large Scale Integer. (VLSI) Syst., vol.
20, no. 6, pp. 1068–1081, Jun. 2012.
M. Ayinala and K. K. Parhi, “Parallel-pipelined radix-22
FFT architecture for real valued signals,” in Proc.
Asilomar Conf. Signals, Syst. Comput., Nov. 2010, pp.
1274–1278.
Manohar Ayinala, Keshab K. Parhi,” FFT Architectures
for Real-Valued Signals Based on Radix-23 and Radix-24
Algorithms” IEEE transactions on circuits and systemsI:regular papers,2013.
© IJREAS, Vol. 02, Issue 02, July 2014
International Journal of Research in Engineering and Applied Sciences
[9]
[10]
[11]
[12]
K. K. Parhi, C. Y. Wang, and A. P. Brown, “Synthesis of
control circuits in folded pipelined DSP architectures,”
IEEE J. Solid State Circuits,vol. 27, no. 1, pp. 29–43,
1992.
Mario Garrido, J. Grajal, M. A. Sánchez, and Oscar
Gustafsson, “Pipelined Radix-2k Feedforward FFT
Architectures” IEEE Trans. on very large scale integration
(VLSI) systems, vol. 21, no. 1, January 2013
V.G. Oklobdzija “Design and analysis of fast carrypropagate adder under non-equal input signal arrival
profile”IEEE 1994
Cha Min, E.E Swartzlander” Modified carry skip adder for
reducing first block delay”IEEE 2000
[13]
[14]
[15]
[16]
ISSN (Print): 2249-9210 | ISSN (Online): 2348-1862
75
R.W Doran “Variants of an improved carry look-ahead
adder “IEEE Tranctions 1988
Jeffrey O. Coleman, Arda Yurdakul”Fractions in the
Canonical-Signed-Digit
Number
System”
2001
Conference on Information Sciences and Systems, The
Johns Hopkins University, March 21–23, 2001
Michael A. Soderstrand”CSD MULTIPLIERS FOR
FPGA DSP APPLICATIONS”Circuit and systems 2003
ISCAS’03.Procedings of 2003 international Symposium on
(volume 5) .
Prabir Saha, A. Banerjee,I Banerjee,ADandapat”High
speed low power floating point multiplier design based on
CSD(canonical sign digit)”VDAT 2010.
© IJREAS, Vol. 02, Issue 02, July 2014