FAST FOURIER TRANSFORM MODULE FOR IMPLEMENTATION IN
NIOS II EMBEDDED PROCESSOR
HISHAM AZHARI AHMED OSMAN
UNIVERSITI TEKNOLOGI MALAYSIA
FAST FOURIER TRANSFORM MODULE FOR IMPLEMENTATION IN NIOS II
EMBEDDED PROCESSOR
HISHAM AZHARI AHMED
A project report submitted in partial fulfillment of the
requirements for the award of the degree of
Master of Electrical-Electronics & Telecommunication Engineering
Faculty of Electrical Engineering
Universiti Teknologi Malaysia
MAY 2008
iii
Specially dedicated to
my beloved mother and father
“Only those who dare to fail greatly, can ever achieve success greatly”
iv
ACKNOWLEDGMENT
First and foremost, I am greatly indebted to Almighty Allah for giving me
endurance and strength to finish this project.
I would like to express my deepest gratitude and thanks to my supervisor,
Professor Dr.Mohamed Khalil bin Haji Mohd Hani for giving me this chance to
explore new grounds in the computer-aided design of electronics system. Really, his
encouragement, guidance, critic and friendship were the reason to carry this project
to a profitable completion. Honestly, in one year I have learnt a lot from him, not
only in my study but also in the lessons of life.
My genuine appreciation goes out to all those who contributed with me to the
completion of this project. In particular, I mention Mr. Illiasaak, Mohd Nazrin,
Alamin Ali and Mrs. Danaletchumi for their sincere guidance and members of
research group in VLSI-eCAD lab.
Special thanks also to my colleagues at the faculty who gave valuable advice
and opinion on the project.
Finally, I would like to express my love and appreciation to my family
members for their support and help even though they were not around. I am also very
grateful for encouragement provided by my friends who have provided assistance
and support at various occasions.
v
ABSTRACT
The Fast Fourier Transform is an indispensable algorithm in many digital
signal processing applications but yet is deemed computationally expensive cost
when designed it on hardware. This thesis proposes a design and implementation of
Fast Fourier Transform algorithm in embedded system by utilize it in Nios II
embedded processor and integrate with Nios II floating point custom instruction. The
design is based on Decimation-In-Time and Decimation-In-Frequency radix-2 for the
better performance and speed. Hardware implementation, the ALTERA CYCLONE
II EP2C35F672C6 (DE2 board) is used. Hardware interfacing, Graphical User
Interface (GUI) has been developed using MATLAB software; it’s an original
method for interfacing between ALTERA Field Programmable Gate Array (FPGA)
and software in host PC. Input values are sent from MATLAB to ALTERA
development board via serial port and the calculation data return back to MATLAB.
The purpose of this technique is take advantages of the MATLAB in analysis and
plot the result.
vi
ABSTRAK
Penukaran Pantas Fourier merupakan suatu algoritma yang mustahak dalam
kebanyakan aplikasi pemprosesan isyarat digital dan hanya dianggap sebagai suatu
kos pengiraan yang tinggi bila ia direka bentuk dalam sesuatu “hardware”. Tesis in
mencadangkan suatu corak dan implementasi algoritma Fast Fourier Transform
dalam system yang terikat dan diaplikasikan di dalam prosessor Nios II dan
dintegrasikan dengan arahan khas titik terapung Nios II. Corak tersebut adalah
berasakan
Decimation-In-Time
dan
Decimation-In-Frequency radix-2
untuk
mendapatkan hasil yang lebih bermutu dan cepat. Dalam implementasi “hardware”,
ALTERA CYCLONE II EP2C35F672C6 (DE2 board) digunakan. “Hardware
interfacing
“,Antaramuka
Grafik
Pengguna
telah
dikembangkan
dengan
menggunakan “software” MATLAB; ia adalah kaedah tulen untuk mewujudkan
sesuatu ruang hubung kait antara ALTERA Field Programmable Gate Array (FPGA)
dan “ hos software” dalam PC. Nilai input dihantar dari MATLAB ke ALTERA
development board melalui port serial dan data pengiraan pula dihantar balik ke
MATLAB. Tujuan teknik ini digunakan adalah untuk meggunakan faedah MATLAB
dalam analisis dan untuk mengeplot hasil / keputusan yang diperoleh.
vii
TABLE OF CONTENTS
CHAPTER
1
TITLE
PAGE
DECLARATION
ii
DEDICATION
iii
ACKNOWLEDGEMENT
iv
ABSTRACT
v
ABSTRAK
vi
TABLE OF CONTENTS
vii
LIST OF TABLES
x
LIST OF FIGURES
xi
LIST OF APPENDIX
xiv
INTRODUCTION
1
1.1
Motivation
1
1.2
Problem Statement
2
1.3
Project Objectives
3
1.4
Scope of Work
4
1.5
Project Contribution
4
1.6
Thesis Organization
5
viii
2
FAST FOURIER TRANSFORM
6
2.1
Introduction
6
2.2
FFT Algorithms
7
2.2.1
2.3
3
Radix-2 FFT Algorithms
8
2.2.1.1
Decimation-In-Time FFT
9
2.2.1.2
Decimation-In-Frequency FFT
15
Algorithms Implementation
17
EMBEDDED SYSTEM DESIGN
18
3.1
Project Procedure
18
3.2
System Architecture
20
3.3
Features Embedded Systems
21
3.4
3.3.1
History and Future of Embedded System
23
3.3.2
Real Time System
24
Embedded Software of FFT Algorithm
3.4.1
25
The Main Function
27
3.4.2.1
The System Inputs
28
3.4.2.2
The Earlier Stages
29
3.4.2.3
The Final Stage
32
Nios II based System on Chip Development
33
3.4.2
3.5
The Complex Class
24
Platform
3.5.1
Nios II Custom Instruction
34
3.5.2
Nios II Floating Point Custom
35
Instruction
3.6
Application Software
36
3.7
ALTERA DE2 Development Kit
37
ix
4
MATLAB GRAPHICAL USER INTERFACE
40
4.1
Introduction
40
4.2
Communication Interface
41
4.3
Serial Port Overview
42
4.3.1
Serial Communication
43
4.3.2
The Serial Port Interface Standard
43
4.3.3
Connecting Two Devices with a Serial
44
Cable
5
6
4.4
MATLAB Software
45
4.5
Integration MATLAB with DE2 Board
45
RESULTS AND PERFORMANCE EVALUATION
48
5.1
Introduction
48
5.2
System Results
49
5.3
Performance Evaluation
53
5.4
Spectral Analysis
55
CONCLUSION
57
6.1
Concluding Remarks
57
6.2
Recommendation for Future Work
58
6.2.1
Higher N-Point FFT Computation
58
6.2.2
The Algorithm Architecture In The
59
Decimation-In-Frequency
6.2.3
High Radix Used
61
6.2.4
Use of the System in Other Application
62
REFERENCES
63
Appendix A - C
66-81
x
LIST OF TABLES
TABLE NO.
TITLE
PAGE
3.1
Main design steps in the project
19
3.2
Math library floating-point usage
36
5.1
Comparison between the systems that includes
55
floating point custom instruction and the system not
including floating point custom instruction for
several function
xi
LISTS OF FIGURES
FIGURE NO.
TITLE
PAGE
2.1
Radix-2 for an N point FFT.
8
2.2
First step in the decimation-in-time algorithm.
10
2.3
Five stages in the computation of an n = 32-point
12
DFT.
2.4
Thirty two-point decimation-in-time FFT algorithm.
13
2.5
Basic butterfly computation in the decimation-in-
13
time FFT algorithm.
2.6
Shuffling of the data and bit reversal.
2.7
Thirty
two-point
decimation-in-frequency
14
FFT
16
algorithm
3.1
Project workflow
19
3.2
System architecture of the design
20
3.3
A Generic embedded system
22
3.4
Nios II processor system
34
3.5
Custom instruction logic connects to the Nios II ALU
35
3.6
ALTERA DE2 board
38
3.7
Block diagram of the DE2 board.
39
4.1
Communication interface between FFT module and
40
MATLAB software
4.2
A male DE-9 connector used for a serial port on a PC
42
style computer.
4.3
Connecting two devices with a serial cable
43
xii
4.4
Integration MATLAB with DE2 board
45
5.1
Ramp function
49
5.2
Output in MATLAB command window
50
5.3
The final output of ramp function
51
5.4
Step function
52
5.5
Final output of step function
52
5.6
Console view displaying Nios II hardware output
53
using floating point custom instruction for step
function
5.7
Console view displaying Nios II hardware output
54
without floating point custom instruction for step
function
5.8
Sinusoidal Signal
56
5.8
Result of Sinusoidal signal
56
6.1a)
FFT for ramp discrete time signal N=32
59
6.1b)
FFT for ramp discrete time signal N=1024
59
6.2a)
Butterfly algorithms for DIF
59
6.2b)
Butterfly algorithms for DIT
60
6.3
Thirty two-point Decimation-In-Frequency FFT
60
algorithms
6.4
Basic butterfly computations in a radix-4 FFT
61
algorithm
6.5
Radix-4 for a 16 point FFT
61
B.1
CPU window
69
B.2
JTAG UART window
70
B.3
Timer window
71
B.4
The system components
72
B.5
Floating point custom instruction location in Nios
72
II embedded processor
B.6
Floating point hardware
73
B.7
Complete configuration of the floating point
73
custom instruction
xiii
B.8
The entire components system in Quartus II
75
B.9
Interface Protocols
76
B.10
System library properties
77
B.11
Run the setting
78
B.12
Complete configuration of the serial connection
79
xiv
LIST OF APPENDIX
APPENDIX
TITLE
PAGE
A
FFT code in C++
66
B
Building and configuration of the embedded
69
system into Nios II embedded processor
C
MATLAB code for connection with DE2
board
80
CHAPTER 1
INTRODUCTION
This thesis proposes a design of Fast Fourier Transform and applies it into
Nios II Embedded Processor. This chapter covers the motivation, problem statement,
project objectives, scope of work, project contributions and finally thesis
organization.
1.1
Motivation
The Fast Fourier transform is a critical tool in digital signal processing where
its value in analyzing the spectral content of signals has found application in a wide
variety of applications. The most prevalent of these applications is being in the field
of communications where the ever increasing demand on signal processing
capabilities have given rise to the importance of the Fourier transform to the field.
However, the Fourier transform is a part of many systems in a wide variety of
industrial and research fields. Its uses range from signal processing for the analysis
of physical phenomena to analysis of data in mathematical and financial systems.
2
The majority of systems requiring Fourier transforms are real time systems
which necessitate high speed processing of data. Given the complexity in performing
The Discrete Fourier, the implementation of high speed Fast Fourier transform has
required the use of dedicated hardware processors. The majority of high performance
Fourier transforms has required the use of full custom integrated circuits and has
typically been in the form of an application, specifically integrated circuit. Although
much work has been put into raising performance while reducing hardware
requirements, and also cost, the cost of full custom hardware still limits the
availability of Fourier transform hardware to low volume production.
Nevertheless the development of programmable logic hardware has produced
devices that are increasingly capable of handling large scale hardware. High density
field programmable gate arrays (FPGA) that are already available in the market can
boast upwards of 180,000 logic elements, nine megabits of memory, and on board
processors.
The use of FPGA in implementing hardware eliminates the need for the long
and costly process of creating a full custom integrated circuit and the time and cost of
testing and verification. Saving cost in designing, testing, and time from design to a
functional device.
These features of the FPGA make it especially attractive for the purpose of
creating embedded processors for research and development purposes.
However the design of any of embedded processors must consider two
important factors efficiency and flexibility for reaching an ideal design.
1.2
Problem Statement
Efficiency and flexibility are two of the most important driving factors in
embedded system design. Efficient implementations are required to meet the tight
cost, timing, and power constraints present in embedded systems. Flexibility, albeit
3
tough to quantify, is equally important; it allows system designs to be easily
modified or enhanced in response to bugs, evolution of standards, market shifts, or
user requirements, during the design cycle and even after production.
Various implementation alternatives for a given function, ranging from
custom-designed hardware to software running on embedded processors, provide a
system designer with differing degrees of efficiency and flexibility. Unfortunately, it
is often the case that these are conflicting design goals. While efficiency is obtained
through custom hardwired implementations, flexibility is best provided through
programmable implementations.
Hardware/software partitioning separating a system’s functionality into
embedded software (running on programmable processors) and custom hardware
(implemented as coprocessors or peripheral units) is one approach to achieve a good
balance between flexibility and efficiency.
1.3
Project Objectives
The aims of this project are as follow:
1.
Design and implementation of Fast Fourier Transform (FFT) algorithm into
embedded system by:
a) Utilizing Nios II embedded processor.
b) Integrating it with Nios II Floating Point Custom Instruction.
2.
Developing MATLAB user interface to verify the proposed FFT system.
4
1.4
Scope of Work
Taking into account the resources and time available, this project is narrowed
down to the following scope of work.
1.
This project only considers 32 point FFT floating point. The Decimation-InTime (DIT) algorithm is chosen.
2.
The algorithm is implemented in C++ language.
3.
Floating Point Custom Instruction is targeted for Nios II platform and
implemented in ALTERA Cyclone II DE2 board.
4.
MATLAB Graphical User Interface (GUI) has been used for the purpose of
interfacing with FPGA hardware to provide inputs and display outputs.
5.
Serial port (RS232) is used for transmitting and receiving data between FPGA
board and MATLAB.
6.
1.5
This Embedded system is applied in Spectral Analysis as an application.
Project Contributions
The most important contributions of this project are:
1.
Integration framework of MATLAB and ALTERA development kit platform.
2.
Utilizing Nios II Floating Point Custom Instruction in the design to increase
performance and accelerate speed.
3.
Created a simple protocol that is used for interaction with and communication
between hardware and software via computer serial port.
5
1.6
Thesis Organization
The thesis is organized into 6 chapters. The first chapter (this chapter)
presents the background of the work, problem statement, research objectives, work
scope and contributions of this project.
The next chapter describes an introduction of the Fast Fourier Transform. A
derivation of the FFT is given and concentrated to radix-2 algorithms.
Chapter 3 is on Embedded System, methodology, tools, and techniques used
to carry out this research project. Embedded System is first explained, followed by
methodology, Nios II Floating Point Custom Instruction, and finally implementation
of FFT algorithm in C++ language.
Chapter 4 discusses the theoretical framework of the project. The chapter
starts with the system Architecture. The following parts discuss the entire
configuration that had been done in Nios II embedded processor. C++ code of
connection between hardware and software is illustrated in this chapter. The last
section of this chapter presents MATLAB commands of transmitting and receiving
the data.
Chapter 5 shows the system results and Nios II results. All results are
appraised and compared.
The last chapter is the conclusion, which summarizes the work in this thesis.
Apart from this, future work is also proposed, which essentially suggests ways to
improve and extend the current design.
This thesis ends with references and appendix, which contains all design files
of the project.
CHAPTER 2
FAST FOURIER TRANSFORM (FFT)
2.1
Introduction
In this chapter, several methods for computing the Discrete Fourier
Transform (DFT) efficiently are presented. In view of the importance of the DFT in
various digital signal processing applications, such as linear filtering, correlation
analysis, and spectrum analysis, its efficient computation is a topic that has received
considerable attention by many mathematicians, engineers, and applied scientists.
Basically, the computational problem for the DFT is to compute the sequence
{X (k)} of N complex-valued numbers given another sequence of data {x (n)} of
length N, according to the following formula:
N -1
X (K ) X (n )W Nkn ,0 K N -1
x 0
W N e - j 2 / N
In general, the data sequence x (n) is also assumed to be complex value.
Similarly, the Inverse Discrete Fourier Transform (IDFT) becomes:
7
N -1
X (n ) X (K )W N kn ,0 n N -1
x 0
We can observe that for each value of k, direct computation of X (k) involves
N complex multiplications (4N real multiplications) and N-1 complex additions (4N2 real additions). Consequently, to compute all N values of the DFT requires N 2
complex multiplications and N 2-N complex additions.
Direct computation of the DFT is basically inefficient primarily because it
does not exploit the symmetry and periodicity properties of the phase factor WN. In
particular, these two properties are:
Symmetry property:
Periodicity property:
/
=
= −
The computationally efficient algorithms described in this section, known
collectively as fast Fourier transform (FFT) algorithms, exploit these two basic
properties of the phase factor.
2.2
FFT Algorithms
The FFT exist in two functionally equivalent forms known as decimation in
time (DIT) and decimation in frequency (DIF). Both are a decomposition of the DFT
by processing through r sample computational units and reducing the computational
complexity of DFT from O (N 2) to O (N log (N)). The various algorithms that result
from the FFT are collectively known as Radix-R Fast Fourier Transforms.
8
The most popular Radix r choices are those of r = 2 and r = 4, and a
commonly used advancement upon the FFT is the use of a mixed radix.
2.2.1 Radix-2 FFT Algorithms
The Radix-2 algorithm takes the DFT and applies a common factor reduction
equating the sum of two N/2 sequences to the N point sequence of the original DFT.
Resulting in the Radix-2 FFT formula below:
[
+2
]=
[
] + (−1)
[ +
2
]
/
This result in processing that follows the signal flow graph as shown in Figure 2.1:
Figure 2.1:
Radix-2 for an N point FFT
9
There are two methods of radix-2
radix FFT algorithms, decimation
ecimation-in-time FFT
algorithm (DIT) and decimation-in-frequency
decimation
FFT algorithm (DIF)
(DIF).
2.2.1.1 Decimation-In
In-Time FFT
Let us consider the computation of the N = 2v point DFT by the divide
divide-and
conquer approach. We split the N-point
point data sequence into two N/2-point data
sequences f1(n)) and f2(n), corresponding to the even-numbered
numbered and odd
odd-numbered
samples of x(n),
), respectively, that is,
( ) = (2 )
( ) = (2 + 1),
= 0,1, … ,
2
−1
Thus f1 (n)) and f2 (n) are obtained by decimating x(n)) by a factor of 2, and
hence the resulting FFT algorithm is called a decimation-in-time
time algorithm
algorithm.
Now the N-point
point DFT can be expressed in terms of the DFT's of the
decimated sequences as follows:
But WN2 = WN/2. With this substitution, the equation can be expressed as
10
Where F1(k)) and F2(k) are the N/2-point
point DFTs of the sequences f1(m) and
f2(m), respectively.
Since F1(k)) and F2(k) are periodic, with period N/2,
/2, we have F1(k+N/2) =
F1(k) and F2(k+N/2)) = F2(k). In addition, the factor WNk+N/2 = −WNk. Hence the
equation may be expressed as follow:
We observe that the direct computation of F1(k)) requires ((N/2)2 complex
multiplications. The same applies to the computation of F2(k).
). Furthermore, there are
N/2
/2 additional complex multiplications required to compute WNkF2(k). Hence the
computation of X(k)) requires 2(
2(N/2)2 + N/2 = N 2/2 + N/2
/2 complex multiplications.
This first step results in a reduction of the number of multiplications from N 2 to N 2/2
+ N/2,
/2, which is about a factor of 2 for N large. Figure 2.2 shows the first step in the
decimation-in-time
time algorithm.
algorithm
Figure 2.2:
First step in the decimation-in-time
time algorithm.
11
By computing N/4-point DFTs, we would obtain the N/2
/2-point DFTs F1(k)
and F2(k)) from the relations.
relations
The decimation of the data sequence can be repeated again and again until the
resulting sequences are reduced to one-point
one
sequences. For N = 2v, this decimation
can be performed v = log2N times. Thus the total number of complex multiplications
is reduced to (N/2) log2 N. The number of complex additions is Nlog
log2N.
For illustration,
illustration Figure 2.3 depicts the computation of N = 32 point DFT. We
observe that the computation is performed in five stages, beginning with the
computations of sixteen two-point
two
DFTs, then eight four-point
point DFTs, then four eight
eightpoint DFTs, then two sixteen-point
sixteen point DFTs and finally, one thirty two
two-point DFT. The
combination for the smaller DFTs to form the larger DFT is illustrated in Figure 2.
2.4
for N = 32. Also Figure 2.5 shows the basic butterfly
utterfly computation in the decimation
decimationin-time
time FFT algorithm.
12
Figure 2.3:
Five stages in the computation of an n = 32-point DFT
13
Figure 2.4:
Figure 2.5:
algorithm.
Thirty two-point decimation-in-time FFT algorithm.
Basic butterfly computation in the decimation-in-time FFT
14
An important observation is the order of the input data sequence after it is
decimated (v-1) times.
For example, if we consider the case where N = 8, we know that the first
decimation yields the sequence x(0), x(2), x(4), x(6), x(1), x(3), x(5), x(7), and the
second decimation results in the sequence x(0), x(4), x(2), x(6), x(1), x(5), x(3), x(7).
This shuffling of the input data sequence has a well-defined
well defined order as can be
ascertained from observing Figure 2.6, which illustrates the decimation of the eight
eightpoint sequence.
Figure 2.6:
2.6
Shuffling of the data and bit reversal.
15
2.2.1.2 Decimation-In
In-Frequency FFT
Another important radix-2
radix 2 FFT algorithm, called the decimation
decimation-in-frequency
algorithm, is obtained by using the divide-and-conquer
divide
conquer approach. To derive the
algorithm, we begin by splitting the DFT formula into two summations, one of which
involves the sum over
ver the first N/2
/2 data points and the second sum involves the last
N/2 data points. Thus we obtain
Now, let us split (decimate) X(k) into the even- and odd--numbered samples.
Thus we obtain
where we have used the fact that WN2 = WN/2
The computational procedure
procedure above can be repeated through decimation of
the N/2-point DFTs X(2k)
X
and X(2k+1).
+1). The entire process involves v = log2N stages
of decimation, where each stage involves N/2 butterflies of the type. Consequently,
the computation of the N-point
N
DFT via the decimation-in-frequency
frequency FFT requires
(N/2) log2N complex multiplications and Nlog2N complex additions, just as in the
decimation-in-time
time algorithm.
algo
For illustration, the thirty two-point
point decimation
decimation-infrequency algorithm is given in Figure 2.7.
16
Figure 2.7:
Thirty two-point decimation-in-frequency
frequency FFT algorithm
17
2.3
Algorithms Implementation
In this project, we implemented both methods Decimation-In-Time (DIT) and
Decimation-In-Frequency (DIF) to verify which one is better in efficiency, speed,
performance and delay. The following researchers are known to apply the same
methods:
Nabeel Shirazi, Peter M. Athanas (Institute and State University Bradley 2004)
applied Fast Fourier Transform algorithm in image processing. The design was based
on decimation-in-time radix-2 algorithm. They obtained good results with high speed
in computation and low delay.
Weidong Li, Jonas Carlsson, Jonas Claeson, and Lars Wanhammar
(Electronics Systems, Department of Electrical Engineering Linkö ping University)
employed Fast Fourier Transform algorithm in Global Asynchronous Local
Synchronous (GALS) based on decimation-in-frequency radix-2 algorithm. They
proved in their simulation that DIF has high performance and efficiency.
Mohd Nazrin (UTM 2004) applied Fast Fourier Transform algorithm in FPGA
technology. The design was based on decimation-in-time radix-2 algorithm. Pursuant
to his simulation and results he concluded that “DIT has many advantages such as
high efficiency, speed, performance and low delay.
Obviously, both methods give the same results but we are looking for
performance, speed, hardware cost and efficiency. Through this thesis we will know
the advantages and disadvantages of both the DIT and theDIF.
CHAPTER 3
EMBEDDED SYSTEM DESIGN
3.1
Project Procedure
The main work of this project is to design Fast Fourier Transform into
embedded system. The workflow is shown in Figure 3.1, with details of the main
steps summarized in Table 3.1. Before the actual design was performed, a literature
review was first conducted, mostly from IEEE and textbook sources to get an
overview of the project in terms of previous works and theoretical background. Then
the problem formulation and scope which form the basis of the design of Fast Fourier
Transform were identified. The work proceeded to obtain the embedded system
architecture which consisted of three parts: hardware design, embedded software and
application software.
The final step is the analysis and discussion of all results obtained, in
particular the performance and speed of the design, and evaluation of the embedded
system.
19
Figure3.1:
Project workflow
Table 3.1: Main design steps in the
he project
Design Steps
Tools
Implement FFT algorithm in C++
language
Bloodshed Dev--C++ software
Create the Embedded System into Altera
Nios II system programmable-on-chip
programmable
(SoPC)
Altera Nios II system
programmable-on
on-chip (SoPC)
Builder
Upload the system in FPGA (DE2 board)
Quartus II (programmer)
Simulate and upload the FFT code in
FPGA (DE2 board)
Nios II IDE
Create the serial connection between the
Hardware and the Software.
Nios II IDE
Graphical user interface
Matlab
20
3.2
System Architecture
Figure 3.2:
System architecture of the design
The system architecture of the design is shown in Figure 3.2 above.
According to this figure, there are two main parts. First is the FFT Embedded
Processor. There are two embedded codes written in C++ code and executed on Nios
II platform and the floating point custom instruction is configured in Nios II
platform. Second is MATLAB which was chosen as the application program in host
PC to provide and receive the data from the SoC in FPGA development board. The
FFT SoC hardware is connected to host PC via a RS232 UART serial
communication protocol.
21
3.3
Features Embedded Systems
An embedded system is a combination of computer hardware and software,
and perhaps additional mechanical or other parts, designed to perform a specific
function. A good example is the microwave oven. Almost every household has one,
and tens of millions of them are used every day, but very few people realize that a
processor and software are involved in the preparation of their lunch or dinner.
This is in direct contrast to the personal computer in the family room. It too is
comprised of computer hardware and software and mechanical components (disk
drives, for example). However, a personal computer is not designed to perform a
specific function. Rather, it is able to do many different things. Many people use the
term general-purpose computer to make this distinction clear. As shipped, a generalpurpose computer is a blank slate; the manufacturer does not know what the
customer will do with it. One customer may use it for a network file server, another
may use it exclusively for playing games, and a third may use it only for word
processing.
Consequently, an embedded system is a component within some larger
system. For example, modern cars and trucks contain many embedded systems. One
embedded system controls the anti-lock brakes, other monitors and controls the
vehicle's emissions, and a third displays information on the dashboard. In some
cases, these embedded systems are connected by some sort of a communications
network, but that is certainly not a requirement.
To avoid confusion, it is important to point out that a general-purpose
computer is itself made up of numerous embedded systems. For example, my
computer consists of a keyboard, mouse, video card, modem, hard drive, floppy
drive, and sound card-each of which is an embedded system. Each of these devices
contains a processor and software and is designed to perform a specific function. For
example, the modem is designed to send and receive digital data over an analogue
telephone line. That's it. And all of the other devices can be summarized in a single
sentence as well.
22
If an embedded system is designed well, the existence of the processor and
software could be completely unnoticed by a user of the device. Such is the case for a
microwave oven, VCR, or alarm clock. In some cases, it would even be possible to
build an equivalent device that does not contain the processor and software. This
could be done by replacing the combination with a custom integrated circuit that
performs the same functions in hardware. However, a lot of flexibility is lost when a
design is hard-coded in this way. It is much easier, and cheaper, to change a few lines
of software than to redesign a piece of custom hardware. A generic embedded system
is shown in Figure 3.3.
Figure 3.3: A Generic embedded system
23
3.3.1 History and Future of Embedded Systems
The first such systems could not possibly have appeared before 1971. That
was the year Intel introduced the world's first microprocessor. This chip, the 4004,
was designed for use in a line of business calculators produced by the Japanese
company Busicom. In 1969- Busicom asked Intel to design a set of custom integrated
circuits-one for each of their new calculator models. The 4004 was Intel's response.
Rather than design custom hardware for each calculator, Intel proposed a generalpurpose circuit that could be used throughout the entire line of calculators. This
general-purpose processor was designed to read and execute a set of instructionssoftware-stored in an external memory chip. Intel's idea was that the software would
give each calculator its unique set of features.
The microprocessor was an overnight success, and its use increased steadily
over the next decade. Early embedded applications included unmanned space probes,
computerized traffic lights, and aircraft flight control systems. In the 1980s,
embedded systems quietly rode the waves of the microcomputer age and brought
microprocessors into every part of our personal and professional lives. Many of the
electronic devices in our kitchens (bread machines, food processors, and microwave
ovens), living rooms (televisions, stereos, and remote controls), and workplaces (fax
machines, pagers, laser printers, cash registers, and credit card readers) are embedded
systems.
It seems inevitable that the number of embedded systems will continue to
increase rapidly. Already there are promising new embedded devices that have
enormous market potential: light switches and thermostats that can be controlled by a
central computer, intelligent air-bag systems that don't inflate when children or small
adults are present, palm-sized electronic organizers and personal digital assistants
(PDAs), digital cameras, and dashboard navigation systems. Clearly, individuals who
possess the skills and desire to design the next generation of embedded systems will
be in demand for quite some time.
24
3.3.2 Real Time System
One subclass of embedded systems is worthy of an introduction at this point.
As commonly defined, a real-time system is a computer system that has timing
constraints. In other words, a real-time system is partly specified in terms of its
ability to make certain calculations or decisions in a timely manner. These important
calculations are said to have deadlines for completion. And, for all practical
purposes, a missed deadline is just as bad as a wrong answer.
The issue of what happens if a deadline is missed is a crucial one. For
example, if the real-time system is part of an airplane's flight control system, it is
possible for the lives of the passengers and crew to be endangered by a single missed
deadline. However, if instead the system is involved in satellite communication, the
damage could be limited to a single corrupt data packet. The more severe the
consequences, the more likely it will be said that the deadline is "hard" and, thus, the
system a hard real-time system. Real-time systems at the other end of this continuum
are said to have "soft" deadlines.
3.4
Embedded Software Implementation of FFT Algorithm
The code accepts 32 inputs and performs Fast Fourier transformation radix-2,
Decimation-in-Time on them. The transformation outputs are complex numbers. In
addition, it calculates the amplitude and the phase of the results for spectral analysis.
The FFT algorithm was implemented using C++.
25
3.4.1 The Complex Class
First we needed to define a new type of variables in order to handle the
complex numbers as there was no embedded type to perform such operation. Hence
we declared a class Complex which had two variables of type FLOAT r and i to
resemble the real and imaginary parts of the complex numbers consecutively.
Then, we defined two constructors: Cmplx() which initializes r and i to their
default values zeroes to avoid any unwanted values while running the program.
The second constructor Cmplx(float real,float im) to give the ability of
initializing the complex number to a predefined value where r receives the value of
the argument real and i the value of im.
Next, we wanted to give those complex numbers flexibility while performing
arithmetic operations (adding, subtracting and multiplying) as we knew that defined
types couldn't use the operators (+,- and *) by default, so we had to override those
operators in the functions
Cmplx operator+(Cmplx y){return Cmplx(r+y.r,i+y.i);}
Cmplx operator-(Cmplx y){return Cmplx(r-y.r,i-y.i);}
Cmplx operator*(Cmplx y)
{
float real, im;
real=(r*y.r)-(i*y.i);
im=(r*y.i)+(i*y.r);
return Cmplx(real,im);
}
A binary operator such as the + operator takes one argument to resemble the
right value next to it and by default it already has the left value. So we used an
argument as the second number which is also a complex number and declared the
26
return value as a complex number initialized to the sum of r (belongs to the first
number) and y.r (real of the second) as its real part, i and y.i as its imaginary part.
The same method was applied to the – operator.
As for the * operator we had to use temporary values (float real, im) to hold
the results of the multiplying operation as we couldn't do it in one step and we had to
keep the values of both numbers from being changed. We returned the complex of
those temporary values.
Finally, to be able to use cout with complex we overrode the function ostream
with the argument Cmp as complex is shown below
friend ostream &operator<<( ostream &output, Cmplx &Cmp )
{
if(Cmp.r==0&&Cmp.i==0)output<<"0";
if(Cmp.r!=0)output<<Cmp.r;
if(Cmp.i>0)output<<"+"<<Cmp.i<<"i";
if(Cmp.i<0)output<<"-"<<-Cmp.i<<"i";
output<<" ";
return output;
}
The basic form would be a+bi, adding the following exceptions while
printing the complex number:
- If both real and imaginary parts of the number are zeroes print 0 to prevent
printing 0+0i
- If the real part is any other than zero print it, so if zero it wouldn't be printed.
- If the imaginary is positive print + followed by the imaginary then i, the same was
applied on the negative one only preceded by a –
The conditions in that order would not print the imaginary part if it equalled
zero. Last is to return the output to be printed.
27
Finally we added two more functions to calculate the amplitude and phase.
Using math built-in functions sqrt( ) and atan( ) considering when using atan( ) the
following:
-
If the imaginary was zero return zero, because in practice zero numbers
caused unwanted results.
In the normal conditions calculate atan( i/r ) which gives a radian angle so we
multiplied the result by 180 then divided it by PI.
3.4.2 The Main Function
To samplify the discussion, let’s us examine an 8 inputs Fourier
transformation and applied our findings on the 32 version. Starting by calculating the
Ws from the following formula:
W N e - j 2 / N
Then
=1
=
=
Until reach
/
=cos(
=cos(
/
/16) − jsin( /16)
= 0.98−j0.195
/8) − jsin( /8)
= 0.923−j0.38
(Due to Symmetry property)
We declared a variable (i) to represent the current W where i takes the values
0,1,2,3,…
As for the power of e we noticed it equalled a constant (-jπ/n) multiplied by a
variable x where x takes the values 0,2,4,6,… and to reduce variables we considered
calculating x in terms of i which gave us x=2*i .Since we dealt with complex
numbers we used the transformation
28
W (i) = cos (2iπ/n) – sin (2iπ/n) and since trigonometry functions in c++ uses
radians we defined π
#define PI 3.14159265358979323846264338327950288419716939937510
to give a good accuracy.
The final step was to write it in a for loop where i started with 0 and ended
with n/2 (n is the number of inputs)
for(i=0;i<n/2;i++)
w[i]=Cmplx(cos(i*2*PI/n),-sin(i*2*PI/n));
3.4.2.1 The System Inputs
We declared a two-dimensional Complex array to hold the inputs as well as
outputs from each stage. We already knew we had 5 stages for the 32 transformation
hence we declared Cmplx x[6][32] in line 42 where x[0][] is for the inputs. That
enabled us to insert the inputs as complex to help with calculations as seen in the line
x[stage][i]=Cmplx(buff,0);
We as well declared two variables; stage to hold the stage number and
totalStages to hold the total number of stages which in our case is 5. By putting
totalStages as a variable we would be able to update the program to work on
different inputs.
29
3.4.2.2 The Earlier Stages
So far we had calculated the inputs and the Ws. After that we wanted to figure
our stages but to keep the very last one until later.
Stage 1
x (0) + x (16)=1→ x’(0)
x (1) + x (17)=1→ x’(1)
x (2) + x (18)=1→ x’(2)
.
.
Until, stage1 is completed.
By observing the formulas for each stage we figured there were two main
equations:
X (stage) = x (stage-1) + x (stage-1)
X (stage) = [x (stage-1) - x (stage-1)]*w
Since every time we had to use only one of the two we needed a variable that
has only two values, so if it was the first value we worked on the first formula until
we had finished we flipped to the other value and onto the second formula.
We declared our variable b of the type boolean setting its initial value to false
which enabled us to work on the first formula and when finished we put b = !b to
give the value true.
for ( i=0 ; i<n ; )
{
if (b = = false)
{
First formula
b=!b;
}
else
{
Second formula
b=!b;
}
}//end for ( )
30
We prepared the loop with a counter i initialized to 0 and limited to n=32 as
the maximum number of inputs. Then we returned to stage 1 to find that first formula
is used half of the total counts BEFORE we flipped to the second one. By contrast in
stage 2 it was used quarter the number of counts, and on stage 3 eighth of total
number. So if stage was going 1,2,3,4 max for each equation before moving to the
next is n/2, n/4, n/8, n/16
Giving a variable d to hold the values 2, 4, 8, 16 we figured d = 2stage Or
d = (pow(2,stage)); which is predefined function in the header math that returned the
power as type double. Since we needed to use the variable in the loop as integer
otherwise we would encounter incompatibility with other variables we converted it
using the cast operator.
d = static_cast<int> (pow(2,stage));
Updating the loop adding inner loops for each formula with a counter f
for( i=0 ; i<n ; )
{
if(b = = false)
{
for(f=0 ; f < n/d ; f++)
{
x[stage][i]=x[stage-1][ ]+x[stage-1][ ];
i++;
}
b=!b;
}
else
{
for(f=0 ; f < n/d ; f++)
{
x[stage][i]=(x[stage-1][ ]-x[stage-1][ ])*w[ ] ;
i++;
}
b=!b;
}
}//end for( )
31
Notice we included the increment of i inside each of the inner loops so the
count continued with each calculation. Otherwise in case of stage1 i would equal 1
after finishing half of the equations!
At that point we needed to put variables for each x and w to suite with each
equation. By examining all the stages we found the first x in the equation started
always from 0, increased by 1, when switched to formula (2) it decreased by n/d, and
when switched back to formula(1) increased by n/d. The same applied to the second
x except it is equal to the first added by n/d. Applying a variable j in the loop we got:
for(f=0 ; f < n/d ; f++)
{
x[stage][i]=x[stage-1][ j ]+x[stage-1][ j+n/d ];
j++;
i++;
}
b=!b;
j -= n/d;
As for the w in stage 1 it had the values 0, 1, 2, 3,…., n/d witch was the same
as f
Stage 2
0,2,4,6,…
Stage 3
0,4,8,12
Stage 4
0,8
Noticing the change in each stage we found it equaled f * 2stage-1, after
declaring the variable df to hold that value the second loop became:
for (f=0 ; f < n/d ; f++)
{
x[stage][i]=(x[stage-1][ j ]-x[stage-1][ j+n/d ])*w[f*df];
j++;
i++;
}
b=!b; j += n/d;
and in line 65
df = static_cast<int>(pow(2,stage-1));
32
3.4.2.3 The Final Stage
The difference in the last stage was in the left value of the equation
x[stage][i] as it didn't actually hold the value of i instead in the 32 inputs it had the
sequence 0-16-8-24-4-20-12-28-2-18-10-26-6-22-14-30-1-17-9-25-5-21-13-29-3-1911-27-7-23-15-31
We changed that to a variable k. In the last stage we moved between our two
loops every count. When finishing the first we noticed k increased by n/2. However
with the second loop changed in the order 8-20-8-26-8-20-8 then repeated the
sequence on the second half. So it was either k-n/4 or another number.
We kept k-n/4 and added a variable z that was either 0 or 1 to add the next
change. Examining the 8, 16 and the 32 inputs that value was 3n/8 witch when
multiplied by z gave us the desired results for the 20 but still needed to add another
number to have 26.
Adding another counter q = 0 when it reached 3 we could add the last
difference. And when half of the equations were reached we returned our variables to
the initial state.
33
if(b==false)
{
for(f=0;f<n/d;f++)
{
x[stage][k]=x[stage-1][j]+x[stage-1][j+n/d];
j++;
if(stage==totalStages)k+=n/2; else k++;
i++;
}
b=!b; j-=n/d;
}
else
{
for(f=0;f<n/d;f++)
{
x[stage][k]=(x[stage-1][j]-x[stage-1][j+n/d])*w[f*df];
j++;
if (stage==totalStages && q= =3) k=k-(n/4)-3*z*(n/8)-2*q;
else if (stage==totalStages && q!=3) k=k-(n/4)-3*z*(n/8);
else k++;
if(z==0){z=1;}
else {z=0;}
i++; q++;
}
b=!b; j+=n/d;
}
if (stage==totalStages && i==n/2) {k=1; z=0; q=0;}
In the end we generated three loops to print the output, the amplitude and the
phase to our results. (Refer Appendix A for the FFT code in C++)
3.5
Nios II based System on Chip Development Platform
Due to availability of tools and rapid prototyping resources, Altera Nios II
system-programmable-on-chip (SoPC) development system was chosen. Under the
Nios II SoPC development environment, Nios II embedded processor serves as the
general-purpose processor, and other peripherals such as UART, various IO
34
controllers, memory, timer and custom instruction are connected to Nios II via
Avalon System Bus as shown in figure 3.4.
Figure 3.4:
Nios II processor system
3.5.1 Nios II Custom Instruction
With the Altera Nios II embedded processor, it can accelerate time-critical
software algorithms by adding custom instructions to the Nios II instruction set. With
custom instructions, it can reduce a complex sequence of standard instructions to a
single instruction implemented in hardware. You can use this feature for a variety of
applications, for example, to optimize software inner loops for digital signal
processing (DSP). The Nios II configuration wizard, part of the Quartus II software’s
SOPC Builder, provides a graphical user interface (GUI) used to add up to 256
custom instructions to the Nios II processor.
35
The custom instruction logic connects directly to the Nios II arithmetic logic
unit (ALU) as shown in Figure 3.5.
Figure 3.5:
Custom instruction logic connects to the Nios II ALU
3.5.2 Nios II Floating Point Custom Instruction
The floating-point custom instructions, optionally available on the Nios II
processor, implement single precision floating-point arithmetic operations. The h/w
designer can use the custom instructions to accelerate floating-point operations in
your Nios II C/C++ application program. This set of custom instructions is available
on every Nios II core implementation. The basic set of floating-point custom
instructions includes single precision floating-point addition, subtraction, and
multiplication. Floating-point division is available as an extension to the basic
instruction set. Table 3.2 is shown the math library of floating point usage.
36
Table 3.2: Math library floating-point
point usage
(Refer Appendix B for SOPC configuration to enable floating point custom
instruction)
The most important advantages of using Nios II floating point custom
instruction are:
1.
A custom instruction accelerates floating-point
point operations in Nios II C/C++
application program.
2.
Take full advantage of the flexibility of FPGAs to meet system performance
requirements.
3.6
Application Software
As the volume and complexity of data and results continues to grow with the
increasing complexity of data sources and algorithms, the need for intuitive
representations of that data and results becomes increasingly critical. The graphical
representation of the results is often not only the most effective means of conveying
the points of the study or work which has provided the data, but is in most cases an
37
expectation of the audience of the work. Even as computing hardware continues to
increase in capability.
Creation software by using visual basic ‘VB’ or basic C can done the work
but it will use only for a particular task and it will be difficult to present as the user
wants.
The solution of this matter is MATLAB software. MATLAB has the
capability to modulate analysis and draw the result. Moreover, it is easier to connect
with the hardware by serial port. Transmitting and receiving operation achieve by
using specific commands. After received the result from the hardware, MATLAB
can present and draw it. MATLAB can apply in any kind of implementation as
Graphical User Interface.
3.7
ALTERA DE2 Development Kit
The target device for this design is on the circuit board available from
ALTERA called Development and Education Board or ALTERA DE2 board.
ALTERA'S DE2 Board shown on the Figure (3.6, 3.7), The DE2 board features a
Cyclone II 2C35 FPGA in a 672-pin package. All important components on the
board are connected to pins of this chip, allowing the user to control all aspects of the
board’s operation. For simple experiments, the DE2 board includes a sufficient
number of robust switches (of both toggle and push-button type), LEDs, and 7segment displays. For more advanced experiments, there are SRAM, SDRAM, and
Flash memory chips, as well as a 16 x 2 character display. For experiments that
require a processor and simple I/O interfaces, it is easy to instantiate ALTERA’S
Nios II processor and use interface standards such as RS-232 and PS/2.
For experiments that involve sound or video signals, there are standard
connectors for microphone, line-in, line-out (24-bit audio CODEC), video-in (TV
38
Decoder), and VGA (10-bit DAC); these features can be used to create CD-quality
audio applications and professional-looking video.
For larger design projects the DE2 provides USB 2.0 connectivity (both host
and device), 10/100 Ethernet, an infrared (IrDA) port, and an SD memory card
connector. Finally, it is possible to connect other user defined boards to the DE2
board by means of two expansion headers.
Figure 3.6:
ALTERA DE2 board
39
Figure 3.7:
Block diagram of the DE2 board.
CHAPTER 4
MATLAB GRAPHICAL USER INTERFACE
4.1
Introduction
The universal asynchronous receiver/transmitter core with Avalon interface
(UART core) implements a method to communicate serial character streams between
an embedded system on an Altera FPGA and an external device. The core
implements the RS-232 protocol timing, and provides adjustable baud rate, parity,
stop and data bits, and optional RTS/CTS flow control signals. The feature set is
configurable, allowing implementing just the necessary functionality for a given
system.
The core provides a simple register-mapped Avalon Memory-Mapped
(Avalon-MM) slave interface that allows Avalon-MM master peripherals (such as a
Nios II processor) to communicate with the core simply by reading and writing
control and data registers. The UART core is SOPC Builder-ready and integrates
easily into any SOPC Builder-generated system.
The UART core implements RS-232 asynchronous transmit and receive
logic. The UART core sends and receives serial data via the TXD and RXD ports.
41
The I/O buffers on most Altera FPGA families do not comply with RS-232
voltage levels, and may be damaged if driven directly by signals from an RS-232
connector.
4.2
Communication Interface
As shown in Figure 4.1 the MATLAB software in host PC is connected
through a communication interface with the FFT module in Nios II platform. In my
module, RS232 or serial communication port is applied because the serial
communication port (UART) is readily available on both FPGA development board
and host PC.
The Nios II UART core implements the RS232 protocol communicate serial
character stream with baud rate 115,200 bits per second between embedded system
on FPGA development and external device. In our work, the external device is the
host PC and it is controlled by Nios II UART registers and functions.
Figure 4.1:
software
Communication interface between FFT module and MATLAB
42
The UART subroutine is a software-programming module that allows user to
read and write to the Nios II UART buffer. In Nios II processor, Altera provides
hardware abstraction layer (HAL) system library devices that enable us to access the
UART core (the address of UART core is based on the specified base address that
has been set in SOPC Builder) using Application Specific Integrated Circuit (ASIC)
C++ standard library functions such as cout(), getchar() etc. To read data (one byte)
from Nios II UART peripheral, getchar() function is used and cout() function is to
write data into UART buffer. To read data larger than 8 bits, program looping must
be used. For example, to read four bytes of data, the following code may be used:
for (int i=0;i<4;i++)
data[i]=getchar();
To write data to Nios II UART buffer, the standard input/output function,
cout (), is used. For example, to write 8 bytes of data in decimal format to UART
devices, the following code is used
for (int i=0;i<8;i++)
cout<<result;
The Nios II UART registers are declared in system.h and generated during
Nios II embedded software system module generation using Nios II IDE.
The serial communication module in the MATLAB software is established by
using Instrumentation Toolbox commands. It provides any type of data to send or
receive.
4.3
Serial Port Overview
43
4.3.1
Serial Communication
Serial communication is the most common low-level
low level protocol for communicating
between two or more devices. Normally, one device is a computer, while the other device
can be a modem, a printer, another computer, or a scientific instrument such as an
oscilloscope
cope or a function generator.
The serial port sends and receives bytes of information in a serial fashion -- one bit
at a time. These bytes are transmitted using either a binary format or a text (ASCII) format.
RS232 port is shown in Figure 4.2.
Figure 4.2:
4.3.2
A male DE-9
DE connector used for a serial port on a PC
C style ccomputer.
The Serial Port Interface Standard
Over the years, several serial port interface standards for connecting
computers to peripheral devices have been developed. These standards include RS
RS232, RS-422,
422, and RS-485
RS
-- all of which are supported by the serial port object. Of
these, the most widely
widel used standard is RS-232,
232, which stands for Recommended
Standard number 232.
The current version of this standard is designated as TIA/EIA
TIA/EIA-232C, which is
published by the Telecommunications Industry Association. However, the term "RS
"RS-
44
232" is still in popular use, and is used when referring to a serial communication port
that follows the TIA/EIA-232 standard.
Primary communication is accomplished using three pins: the Transmit Data
pin, the Receive Data pin, and the Ground pin. Other pins are available for data flow
control, but are not required.
4.3.3 Connecting Two Devices with a Serial Cable
The RS-232 standard defines the two devices connected with a serial cable as
the Data Terminal Equipment (DTE) and Data Circuit-Terminating Equipment
(DCE). This terminology reflects the RS-232 origin as a standard for communication
between a computer terminal and a modem.
The host PC computer is considered a DTE, while peripheral devices such as
modems and printers are considered DCEs and many scientific instruments function
as DTEs. Since RS-232 mainly involves connecting a DTE to a DCE, the pin
assignments are defined such that straight-through cabling is used, where pin 1 is
connected to pin 1, pin 2 is connected to pin 2, and so on. A DTE to DCE serial
connection using the transmit data (TD) pin and the receive data (RD) pin is shown
in Figure 4.3.
Figure 4.3:
Connecting two devices with a serial cable
45
4.4
MATLAB Software
MATLAB is a high-performance language for technical computing. It
integrates computation, visualization, and programming in an easy-to-use
environment where problems and solutions are expressed in familiar mathematical
notation.
MATLAB is an interactive system whose basic data element is an array that
does not require dimensioning. This allows you to solve many technical computing
problems, especially those with matrix and vector formulations, in a fraction of the
time it would take to write a program in a scalar non-interactive language such as C
or FORTRAN.
MATLAB has extensive facilities for displaying vectors and matrices as
graphs, as well as annotating and printing these graphs. It includes high-level
functions for two-dimensional and three-dimensional data visualization, image
processing, animation, and presentation graphics. It also includes low-level functions
that allow you to fully customize the appearance of graphics as well as to build
complete graphical user interfaces on your MATLAB applications.
4.5
MATLAB Integration with ALTERA DE2 Board
In this operation, a serial port was chosen for connection between MATLAB
and DE2 board, which means all the input and output, transition through this serial
connection. The connection is shown in Figure 4.4.
46
Figure 4.4:
Integration MATLAB with DE2 board
Now the question is how can this connection work?
First, create a serial port object by using the serial function and configure property
values during object creation. The following code may be used:
% Create a serial port object.
S = serial ('COM6','baudrate', 115200);
Initially, this requires which COM is to be used and the baudrate of the
connection. In this case we will use 115200 bits per second. Now the serial port
object “s” exists in the MATLAB workspace. Second, confirm that the link has been
established. The subsequent code can be used:
% Confirm the connection has been established
get(s,{'Name','Port','baudrate','DataBits','Type'})
Before we can perform a read or write operation, serial port object must be
connected to the device. With “fopen” function the serial port object will connect to
the DE2 board and become ready to receive the data.
% Connect to serial port object, S.
fopen(S);
47
After this the connection is prepared to transmit and receive from both
directions. At this instant, the reading and writing operation can be achieved and the
results are displayed on the MATLAB window. When the receiving operation is
completed, we should disconnect the serial port from the instrument, and remove it
from memory and from the MATLAB workspace. The following code can be used:
% Disconnect the connection
fclose (S);
% Delete the hardwarre
delete( S);
(Refer Appendix C for the MATLAB code)
CHAPTER 5
RESULTS AND PERFORMANCE EVALUATION
The integration system results for the FFT embedded system and the
performance evaluation of Nios II embedded processor are presented in this chapter.
5.1
Introduction
In the design procedure and upon completion, simulation provides an
important role to prove that the design is correct or not. Firstly, the FFT embedded
system was verified through functional simulation. Performance of floating point
custom instruction was evaluated by comparing with embedded software. Spectral
analysis has been accomplished as an application of this system as well as finding
FFT coefficients.
49
5.2
System Results
After we built and programmed the embedded system and uploaded it into
DE2 board, and we established the connection between MATLAB and the board via
serial port. First, we sent the input from MATLAB. The values of the input are a
Ramp Function starting from 1 to 32. Then the system received the input and
changed it to ASCII code and computed the values to FFT. Figure 5.1 is the
illumination of the input.
Figure 5.1:
Ramp function
50
After calculating the FFT algorithm for inputs,, the output is transmitted
through serial port to MATLAB and the output is displayed in the Command
Window as shown in Figure 5.2.
The output received here as
ASCII type.
Figure 5.2:
5.2
Output in MATLAB command wi
window
Before plotting
ting the result, we converted the output from ASCII code to the
characters and then to double (complex) type. The
he following code can be used:
% Convert the result from ASCII code to character
d=char(S);
% Convert the result to complex number
w=str2num(C);
51
Once this step is achieved, we plot the result as shown in Figure 5.3.
Figure 5.3:
The final output of ramp function
Let’s try another kind of function like Step Function as an input. The result is
shown in Figures 5.4 and 5.5.
52
Figure 5.4:
Figure 5.5:
Step function
Final output of step function
53
5.3
Performance Evaluation
To understand and evaluate the performance of the FFT embedded system,
we try to build a system without floating point custom instruction and insert a timer
in the system for the purpose of computing the cycle of the operation. Next, we
calculate how many cycles
cyc
that were taken to execute the program. Then we take
note of the results when the floating point custom instruction is applied in the syste
system.
The following Figures
igures (5.6 and 5.7) show the number of clock cycles.
Figure 5.6:
Console view displaying Nios II hardware output using floating point
custom instruction for step function
54
Figure 5.7:
Console view displaying Nios II hardware output without floating
point custom instruction for step function
After obtaining both results we can have a good comparison of which is
better in performance and faster
fast in execution. The following Table
able 5.1 shows the
comparison between the systems that includes floating point custom instruction and
the system not including floating point custom instruction.
55
Table 5.1 Comparison between the systems that includes floating point custom
instruction and the system not including floating point custom instruction for several
function
5.3
Spectral Analysis
After evaluating the performance of the system, we used FFT Function for
Spectral Analysis. A common use of FFT's is to find the frequency components of a
signal in time domain signal.
Firstly create a signal as sin wave in MATLAB and send it to the FFT
embedded system. Then after calculating the FFT algorithm, the system will find the
frequency and phase spectral. Then the results are returned to MATLAB for
displaying. The following Figures (5.8, 5.9) show the process of this operation.
56
Figure 5.8:
Sinusoidal Signal
Figure 5.9 Result of Sinusoidal signal
CHAPTER 6
CONCLUSION
The conclusions of the entire experiment and the project are presented in this
chapter. Recommendations for enhancing the precision and performance of FFT
embedded system are also included in this chapter. The recommendations include
speed and logic cells requirements.
6.1
Concluding Remarks
This thesis demonstrates the design of embedded system and FPGA
technology implementation of Fast Fourier Transform algorithm. The algorithm used
was radix-2 decimation-in-time for 32-floating point. The FFT embedded system was
included the floating point custom instruction as an alternative choice for the floating
point arithmetic operation.
The floating point custom instruction has given the system better
performance and speed in the floating point operation, which has been proven in the
result.
58
Moreover in this thesis, I introduced a new technique to provide any kind of
data to FPGA development technology from host PC by using MATLAB software,
instead of creating specified GUI by visual basic or C. This method will make the
connection easier, uncomplicated, useful and practical.
Finally, our experiment thus far have demonstrated promising results,
indicating that floating point custom instructions can result in large improvements in
performance, energy, and timing, while significantly reducing design turnaround
time.
6.2
Recommendation for Future Work
There are countless ways that the designed FFT embedded system can be
improved; for example: by introducing Higher N-Point FFT Computation, the
algorithm architecture in the decimation-in-frequency and high radix can be used to
make the design more robust.
6.2.1 Higher N-Point FFT Computation
The FFT Embedded system just takes 32- floating point as a data sample
input. This 32-point is not suitable for the real application and in systems that require
high precision. Therefore the higher N-point should be designed. Figure 6.1 shows
the difference between 32-point and 1024-point FFT for step discrete time signal.
59
(a)
Figure 6.1:
6.1
6.2.2
(b)
FFT for ramp discrete time signal
ignal (a) N=32,
(b) N=1024
The
he Algorithm Architecture In The Decimation-In-Frequency
Decimation
Frequency
The two radix-2
radix algorithms, Decimation-In-Time
Time (DIT) and Decimation
Decimation-InFrequency (DIF) were discussed in Chapter 2.
2. The DIF is more widely used
compared to DIT especially in orthogonal-frequency-division-multiplexing
orthogonal
multiplexing (OFDM)
system. Figure 6.2 shows the differences between
bet
the DIT and DIF algorithms and
Figure 6.3 shows the thirty two-point decimation-in-frequency
frequency FFT algorithm
algorithm.
(a)
60
(b)
Figure 6.2:
Figure 6.3:
Butterfly algorithms
lgorithms for (a) DIF and (b) DIT
Thirty two-point Decimation-In-Frequency
Frequency FFT algorithms
61
6.2.3
High Radix Used
The Radix-22 is the basic element to build the completed butterfly processing
element. Obviously by using higher radix,, that will reduce numbers of additions per
butterfly. As a result, the benefit of using higher radix for the FFT needs to be
studied. Figures 6.4 and 6.5 are show the structures of radix-44 butterfly processing
element.
Figure 6.4::
Basic butterfly computations in a radix-44 FFT algorithm
Figure 6.5:
Radix-4 for a 16 point
oint FFT
62
6.2.4 Use of the System in Other Application
The previous chapter has introduced the Spectra Analysis as one of the basic
applications of FFT algorithms. However, we can use this particular system in
several applications like image processing, filtering (low pass filter) and fast
convolution. FFT algorithm has been used effectively in image processing to remove
the noise by converting the image from time domain to frequency domain. The noise
is reduced to lines in the image which can easily be eliminated.
Moreover, the FFT algorithm can be used to perform a fast version of
convolution. Since the output of a linear time-invariant discrete-time system is the
convolution of the input and the unit-pulse response, this can be used to compute the
output response. Also the FFT algorithm can be used to perform filtering to eliminate
undesirable frequencies.
REFRENCES
1. Edward W. Kamen, Bonnie S. Heck.”Fundamentals of Signals and Systems
using MATLAB”. Prentice Hall 1997.
2. Thomas L. Floyd. “Digital Fundamentals”. Pearson Education 2003
3. VECAD Technical Report (veCAD-NIOS-TUT-TR2007009) Nios II Tutorial:
Custom Instruction (Multi-Cycle Custom Instruction Architecture) version 1.1
4. Monson H. Hayes. “Digital Signal Processing”. Schaum’s outlines 1999.
5. Mohamed Khalil Hani. “Outline of Digital Systems HDL-based Design”. UTM
2006.
6. MOHD NAZRIN. “The Implementation of Fast Fourier Transform (FFT)
Radix-2 Core Processor using VHDL in FPGA-Based Hardware”. UTM 2003.
7. Altera Corporation (2004a). “UART core with Avalon Interface”
8. Altera Corporation (2007). “Nios II Processor Reference Handbook”
9. Altera Corporation. “Using Nios II Floating Point Custom Instruction”
64
10. Stanley B.Lippman, Josse Lajoie, Barbara E. Moo. “C++ PRIMER”. Addison
Wesley 2005.
11. Altera Corporation. “Nios II Custom Instruction User Guide”
12. Vinay K. Ingle and John G. Proakis.” Digital Signal Processing Using
MATLAB (Bookware Companion Series)”. Homson-Engineering 1999.
13. Altera Corporation.”Quartus II Version 7.2 Handbook Volume 4: SOPC
Builder”
14. Altera Corporation.” DE2 Development and Education Board”
15. Chih-Wei Liu, (2005), Introduction to FFT Processor. National Chiao-Tung
University.
16. J.D. Bruguera and T.Lang,(1995), “Implementation of The FFT Butterfly With
Redundant Arithmetic”, University of California.
17. Stephen Brown (2000), “Fundamentals of Digital Logic with VHDL Design”,
McGraw- Hill International Editions.
18. Nabeel Shirazi, Peter M. Athanas, and A. Lynn Abbott (1995), Implementation
of a 2-D Fast Fourier Transform on a FPGA-Based Custom Computing
Machine. Virginia Polytechnic Institute and State University.
19. Altera Corporation, “Nios II Hardware Development Tutorial”
20. Edix Cetin, Richard C.S.Morling and Izzet Kale,(1997) “ An Integrated 256-point
Complex FFT Processor for Real-Time Spectrum Analysis and Measurement”, IEEE
Proceedings of Instrument and Measurement Technology Conference, Vol. 1.96-101.
21. Weidong Li and Lars Wanhammar,(1996) “A Pipeline FFT Processor”, IEEE
Workshop on Signal Processing System, pages 654-622.
65
22. Sergey E. Lyshevski. ” Engineering and Scientific Computations Using
MATLAB”. Wiley Interscience 2005.
23. MathWorks Corporation. “Getting Started with MATLAB 7”
24. Patrick Marchand and O.Thomas Holland.”Graphics and GUIs with
MATLAB”, third edition Chapman & Hall/CRC.2003
25. Vinay K. Ingle and John G. Proakis.” Digital Signal Processing Using
MATLAB (Bookware Companion Series) ”. homson-Engineering 1999.
26. B. Fagin and C. Renard, “Field Programmable Gate Arrays and Floating Point
Arithmetic,” IEEE Transactions on VLSI, Vol. 2, No. 3, September 1994,
pp.365-367.
27. N. Shirazi, A. Walters, and P. Athanas, “Quantitative Analysis of Floating Point
Arithmetic on FPGA Based Custom Computing Machines,” To appear at IEEE
Symposium on FPGAs for Custom Computing Machines, April 1995.
28. LISATek Products [Online]. Available: http://www.coware.com.
29. J. A. Fisher, “Customized instruction sets for embedded processors,” in Proc.
Design Automation Conf., June 1999, pp. 253–257.
30. MATLAB 6. 5 Release 13, CD-ROM, Mathworks, Inc., 2002.
APPENDIX A
FFT CODE IN C++
#include<iostream.h>
#include<math.h>
#include <stdio.h>
#define PI
3.14159265358979323846264338327950288419716939937510
class Cmplx
//Define Complex numbers
{
private:
float r,i;
public:
Cmplx(){r=0; i=0;}
Cmplx(float real,float im){r=real; i=im;}
//to use cout<< with cmplx
friend ostream &operator<<( ostream &output,
Cmplx &Cmp )
{
if(Cmp.r==0&&Cmp.i==0)output<<"0";
if(Cmp.r!=0)output<<Cmp.r;
if(Cmp.i>0)output<<"+"<<Cmp.i<<"i";
if(Cmp.i<0)output<<"-"<<-Cmp.i<<"i";
output<<" ";
return output;
}
//to add, subtract and multiply cmplx
Cmplx operator+(Cmplx y){return
Cmplx(r+y.r,i+y.i);}
Cmplx operator-(Cmplx y){return Cmplx(r-y.r,iy.i);}
Cmplx operator*(Cmplx y)
{
67
float real, im;
real=(r*y.r)-(i*y.i);
im=(r*y.i)+(i*y.r);
return Cmplx(real,im);
}
};
int main()
{
while(1){
bool b=false;
int n=32,i,j,k,f,s=0,stage=5,d,df,z,q;
signed char buff;
Cmplx x[6][32], w[16];
float g;
//Insert float numbers, then convert them into complex
for(i=0;i<n;i++)
{
}
buff = getchar();
cout<<buff;
x[s][i]=Cmplx(buff,0);
//Calculate Ws
for(i=0;i<n/2;i++)
w[i]=Cmplx(cos(i*2*PI/n),-sin(i*2*PI/n));
//Calculate and print all Stages
while(s<stage) {
s++;
d=static_cast<int>(pow(2,s));
df=static_cast<int>(pow(2,s-1));
j=0; k=0; f=0; i=0; z=0; q=0;
while(i<n)
{
if(b==false)
{
for(f=0;f<n/d;f++)
{x[s][k]=x[s-1][j]+x[s-1][j+n/d];
j++;
if(s==stage)k+=n/2; else k++;
i++;
}
b=!b; j-=n/d;
}
else
{
for(f=0;f<n/d;f++)
{x[s][k]=(x[s-1][j]-x[s1][j+n/d])*w[f*df];
j++;
if(s==stage && q==3)k=k-(n/4)-3*z*(n/8)2*q;
68
else if (s==stage && q!=3)k=k-(n/4)3*z*(n/8);
else k++;
if(z==0){z=1;}
else {z=0;}
i++; q++;
}
b=!b; j+=n/d;
}
if(s==stage && i==n/2) {k=1; z=0; q=0;}
}//end while(i<n)
}//end while(s<stage)
cout<<"\nOutput"<<endl;
for(i=0;i<n;i++)cout<<"x"<<"("<<i<<") =
"<<x[stage][i]<<endl;
cout<<endl;
}//end while(1)
return 0;
}//end main
APPENDIX B
BUILDING AND CONFIGURATION OF THE EMBEDDED SYSTEM INTO
NIOS II EMBEDDED PROCESSOR
1- CPU
create CPU in Nios II processor, Figure B.1 is shown CPU window
B. 1:
CPU window
70
2- JTAG UART
create JTAG UART in Nios II processor, Figure B.2 is shown JTAG UART
window
B. 2:
JTAG UART window
3- Timer
create Timer in Nios II processor, Figure B.3 is shown the Timer window
71
B. 3:
Timer window
4- Floating Point Custom Instruction
After created and generated CPU then
Open SOPC Builder
Edit The Nios II CPU is shown in Figure B.4
72
B. 4:
The system components
Then click the Custom Instruction Tab and Add the Floating Point Hardware
is shown in Figure B.5
B. 5:
Floating point custom instruction location in Nios II embedded processor
73
Tick Use Floating Point Division hardware to include floating point division
and click finish, is shown in Figure B.6
B. 6:
Floating point hardware
Now , you have added the floating point hardware, click finish to continue, as
shown in Figure B.7
B. 7:
Complete the configuration of the floating point custom instruction
74
Generate the HDL for your SOPC Builder system. When the generation
process is complete, exit SOPC Builder.
module FFT(clk,SD_CLK, SD_ADDR, SD_BA, SD_CASN, SD_CKE, SD_CSN, SD_DQ, SD_DQM,
SD_RASN, SD_WEN,rxd_to_the_uart,txd_from_the_uart,pio);
input clk,rxd_to_the_uart;
//SDRAM input output//
output [11:0] SD_ADDR;
output [1:0] SD_BA, SD_DQM;
inout [15:0] SD_DQ;
output SD_CLK,SD_CASN,SD_CKE,SD_CSN,SD_RASN,SD_WEN,txd_from_the_uart;
//call Phase-Locked Loop//
PLL phase(clk,SD_CLK);
//call GCD_SOPC with reset -->1 //
TESTING_SOPC system_sopc(clk,reset_switch,pio,SD_ADDR,SD_BA,SD_CASN,
SD_CKE,SD_CSN,SD_DQ,SD_DQM,SD_RASN,SD_WEN,rxd_to_the_uart,txd_from_the_uart;
assign reset_switch=1'b1;
endmodule
Compile the Quartus II project, is shown in Figure B.8
75
B. 8:
The entire components system IN Quartus II
5- UART
Open Project in FFT folder
Open SOPC Builder
Click the Interface Protocols, is shown in Figure B.9
76
B. 9:
Interface Protocols
Then click the Serial, after that click UART (RS-232 Serial Port) to include it
in the design.
Generate the HDL for your SOPC Builder system. When the generation process is
complete, exit SOPC Builder.
Compile the Quartus II project.
After completing the Compile process, UART (RS-232 Serial Port) will be
added to the NIOS II in the Entity Project Navigator.
77
6- The serial connection in Nios II embedded processor
Right-click the FFT project in the NIOS II C/C++ Projects view and
click System Library Properties and change the following setting like
Figure B.10 then click OK.
B. 10: System library properties
78
Click Build Project.
After finish, Go to menu bar, select Run >> Run…, and then change
the following setting like figure B.11, then click Run:
USB
Figure B.11:
Run the setting
79
The IDE download to the FPGA boards and starts execution. And then the
console window will display the message from NIOS II, as shown in
Figure B.12
When this message appears, that
means the hardware is ready to receive the
data via the serial port
B. 12: Complete configuration of the serial connection
APPENDIX C
MATLAB CODE FOR CONNECTION WITH DE2 BOARD
% Create a serial port object.
obj1 = serial('COM6','baudrate',115200);
% Connect to instrument object, s2.
fopen(obj1);
% Input values
a=[0:31];
% Send the input as a character values
fwrite(obj1,a,'schar');
% Verify the hardware receive all the data
while(obj1.BytesAvailable < 40)
pause(0.1)
end
% Receive the result from tha FPGA board
S=fread(obj1);
% Disconnect the connection
fclose(obj1);
% Delete the hardwarre
delete(obj1);
81
% Convert the result from ASCII code to character
d=char(S);
C=d';
% Convert the result to complex number
w=str2num(C);
T=w';
% Plot the result
subplot (3,1,1);plot(real(T)) ;
xlabel('NO. of points');ylabel('Frequency');grid on
subplot (3,1,2);plot(imag(T),'m') ;
xlabel('NO. of points');ylabel('Frequency');grid on
subplot (3,1,3);plot(real(T),imag(T),'g');
xlabel('NO. of points');ylabel('Frequency');grid on
figure,subplot (2,1,1);stem(real(T),'b');
xlabel('NO. of points');ylabel('Frequency');grid on
subplot (2,1,2);stem(imag(T),'r');
xlabel('NO. of points');ylabel('Frequency');grid on
© Copyright 2026 Paperzz