FFT library v. 2.0 benchmarks Sep 2009 Complex/real FFT, 16/32bit FFT, radix4/2 FFT, windowing, sqrt and magnitude functions for Cortex-M3. Ivan Mellen Embedded Signals [email protected] • • • • • • • • All functions were written in hand optimized assembly code. Small speed improvement still possible especially for small sizes FFT or high latency configurations Not all benchmarks performed due to large number of combinations ( function / size/ latency /worst or best case) Benchmark values include C call overhead ( without C optimization, worst case) Tested on real hardware (STM32) Lat0, Lat1 and Lat2 in the benchmark table specify STM32 flash latency (0,1,2 ) Latency 0 benchmark directly applicable to other Cortex-M3 implementations Some functions ported to other ARM cores (e.g. ARM 9E) Brief FFT library v 2.0 desription: • • • • • Three groups of functions: o Windowing function o Fast Fourier Transform o Complex magnitude (absolute value of complex frequency) Three library versions o GCC ( Rowley CrossWorks, Raisonance, …) o Keil MDK-ARM o IAR Embedded Workbench Windowing functions (e.g. Hamming window) o Perform speed optimized windowing of input signal before FFT o 16 to 32 bit version performs proper scaling of 16 bit signal for 32 bit FFT FFT functions o Complex and real FFT, 16 and 32bit FFT versions o Radix4/2 FFT – sizes 4,8,16,32,64,128,256,512,1024,2048 and 4096 o Inverse FFT available o 32 bit FFT increases dynamic range by 90 dB , needs extra 20% to 50% cycles o Coefficients located in Flash. RAM location means faster FFT for higher latencies. Magnitude functions o Calculate complex frequency magnitude mag=sqrt (re2 + im2) o Based on custom 32 bit square root algorithm (7 cycles) o Multiple precision/speed variants for 32 bit frequencies (64 bit sqrt needed) Windowing functions benchmarks Function Window16b_real Window16to32b_real Window16b_complex Window16to32b_complex Window32to32b_real Window32to32b_complex Points 16 32 64 128 256 512 1024 2048 4096 16 32 64 128 256 512 1024 2048 4096 16 32 64 128 256 512 1024 2048 4096 16 32 64 128 256 512 1024 2048 4096 Lat0 123 217 405 781 1533 3037 6045 12061 24093 199 369 709 1389 2749 5469 10909 21789 43549 137 243 455 879 1727 3423 6815 13599 27167 229 427 823 1615 3199 6367 12703 25375 50719 1024 points Lat0 Window16b_real /16to32 Window16b_complex /16to32 Window32to32b_real Window32to32b_complex 6045 10909 6815 12703 Best case Lat1 Lat2 6174 6303 6950 7084 Lat0 7842 Worse case Lat1 Lat2 7974 8108 Cofficients in RAM Cofficients in Flash Lat1 Lat2 Lat0 Lat1 Lat2 6174 6303 6045 6558 7324 11038 11167 10909 11422 12188 6944 7073 6815 7584 8734 12832 12961 12703 13472 14622 Magnitude functions benchmarks Function magnitude16_16bIn magnitude16_32bIn magnitude24_32bIn magnitude32_32bIn Points Lat0 16 32 64 128 256 512 1024 2048 4096 16 32 64 128 256 512 1024 2048 4096 16 32 64 128 256 512 1024 2048 16 32 64 128 256 512 1024 2048 4096 193 393 793 1593 3193 6393 12793 25593 51193 193 393 793 1593 3193 6393 12793 25593 51193 268 556 1132 2284 4588 9196 18412 36844 240 496 1008 2032 4080 8176 16368 32752 65520 Best case Lat1 Lat2 14327 15860 14327 15860 20457 24035 18413 21991 Lat0 275 571 1163 2347 4715 9451 18923 37867 75755 Worse case Lat1 Lat2 21479 25568 FFT functions benchmarks Function real FFT 16b complex FFT 16b complex IFFT 16b real FFT 32b complex FFT 32b Points 16 32 64 128 256 512 1024 2048 4096 16 32 64 128 256 512 1024 2048 4096 16 64 256 1024 4096 16 32 64 128 256 512 1024 2048 4096 16 32 64 128 256 512 1024 2048 4096 Lat0 494 1021 2548 5377 12652 26701 60871 127705 285152 1659 3575 9027 19425 46298 98541 226820 Best case Lat1 Lat2 64959 80253 3797 4588 20685 25144 105113 128070 85205 94627 Lat0 Worse case Lat1 Lat2 3597 19475 98803 604 1331 3262 7075 16334 35055 78634 167099 368062 677 1924 4326 10772 23859 56171 122111 278015 97386 103942 112592
© Copyright 2026 Paperzz