FFT library v 2 - Embedded Signals

FFT library v. 2.0 benchmarks
Sep 2009
Complex/real FFT, 16/32bit FFT, radix4/2 FFT, windowing, sqrt and
magnitude functions for Cortex-M3.
Ivan Mellen
Embedded Signals
[email protected]
•
•
•
•
•
•
•
•
All functions were written in hand optimized assembly code.
Small speed improvement still possible especially for small sizes FFT or high latency
configurations
Not all benchmarks performed due to large number of combinations ( function / size/
latency /worst or best case)
Benchmark values include C call overhead ( without C optimization, worst case)
Tested on real hardware (STM32)
Lat0, Lat1 and Lat2 in the benchmark table specify STM32 flash latency (0,1,2 )
Latency 0 benchmark directly applicable to other Cortex-M3 implementations
Some functions ported to other ARM cores (e.g. ARM 9E)
Brief FFT library v 2.0 desription:
•
•
•
•
•
Three groups of functions:
o Windowing function
o Fast Fourier Transform
o Complex magnitude (absolute value of complex frequency)
Three library versions
o GCC ( Rowley CrossWorks, Raisonance, …)
o Keil MDK-ARM
o IAR Embedded Workbench
Windowing functions (e.g. Hamming window)
o Perform speed optimized windowing of input signal before FFT
o 16 to 32 bit version performs proper scaling of 16 bit signal for 32 bit FFT
FFT functions
o Complex and real FFT, 16 and 32bit FFT versions
o Radix4/2 FFT – sizes 4,8,16,32,64,128,256,512,1024,2048 and 4096
o Inverse FFT available
o 32 bit FFT increases dynamic range by 90 dB , needs extra 20% to 50% cycles
o Coefficients located in Flash. RAM location means faster FFT for higher latencies.
Magnitude functions
o Calculate complex frequency magnitude mag=sqrt (re2 + im2)
o Based on custom 32 bit square root algorithm (7 cycles)
o Multiple precision/speed variants for 32 bit frequencies (64 bit sqrt needed)
Windowing functions benchmarks
Function
Window16b_real
Window16to32b_real
Window16b_complex
Window16to32b_complex
Window32to32b_real
Window32to32b_complex
Points
16
32
64
128
256
512
1024
2048
4096
16
32
64
128
256
512
1024
2048
4096
16
32
64
128
256
512
1024
2048
4096
16
32
64
128
256
512
1024
2048
4096
Lat0
123
217
405
781
1533
3037
6045
12061
24093
199
369
709
1389
2749
5469
10909
21789
43549
137
243
455
879
1727
3423
6815
13599
27167
229
427
823
1615
3199
6367
12703
25375
50719
1024 points
Lat0
Window16b_real /16to32
Window16b_complex /16to32
Window32to32b_real
Window32to32b_complex
6045
10909
6815
12703
Best case
Lat1
Lat2
6174
6303
6950
7084
Lat0
7842
Worse case
Lat1
Lat2
7974
8108
Cofficients in RAM
Cofficients in Flash
Lat1
Lat2
Lat0
Lat1
Lat2
6174
6303
6045
6558
7324
11038
11167
10909
11422
12188
6944
7073
6815
7584
8734
12832
12961
12703
13472
14622
Magnitude functions benchmarks
Function
magnitude16_16bIn
magnitude16_32bIn
magnitude24_32bIn
magnitude32_32bIn
Points
Lat0
16
32
64
128
256
512
1024
2048
4096
16
32
64
128
256
512
1024
2048
4096
16
32
64
128
256
512
1024
2048
16
32
64
128
256
512
1024
2048
4096
193
393
793
1593
3193
6393
12793
25593
51193
193
393
793
1593
3193
6393
12793
25593
51193
268
556
1132
2284
4588
9196
18412
36844
240
496
1008
2032
4080
8176
16368
32752
65520
Best case
Lat1
Lat2
14327
15860
14327
15860
20457
24035
18413
21991
Lat0
275
571
1163
2347
4715
9451
18923
37867
75755
Worse case
Lat1
Lat2
21479
25568
FFT functions benchmarks
Function
real FFT 16b
complex FFT 16b
complex IFFT 16b
real FFT 32b
complex FFT 32b
Points
16
32
64
128
256
512
1024
2048
4096
16
32
64
128
256
512
1024
2048
4096
16
64
256
1024
4096
16
32
64
128
256
512
1024
2048
4096
16
32
64
128
256
512
1024
2048
4096
Lat0
494
1021
2548
5377
12652
26701
60871
127705
285152
1659
3575
9027
19425
46298
98541
226820
Best case
Lat1
Lat2
64959
80253
3797
4588
20685
25144
105113
128070
85205
94627
Lat0
Worse case
Lat1
Lat2
3597
19475
98803
604
1331
3262
7075
16334
35055
78634
167099
368062
677
1924
4326
10772
23859
56171
122111
278015
97386
103942
112592