Fast Fourier Transform

FAST FOURIER
TRANSFORM
The magic behind the equations
INTRODUCTION - 1
Fourier Transform
F
f (t )e jt dt
1
f (t )
2
F e
j t
Discreet Time Fourier Transform

N 1
F f (n)e jn , n 0..N 1
n0
Discreet Fourier Transform

N 1
F p f ( n )e
n0
j 2
p
n
M
, n 0..N 1, p 0..M 1
INTRODUCTION - 2
When N=M
N 1
F p f ( n )e
j 2
p
n
N
, n, p 0..N 1
n 0
1
f ( n)
N
N 1
F ( p )e
j 2
p
n
N
p 0
Lets evaluate the computational effort

W e
2
N
N 1
F p f (n)W np , n, p 0..N 1
n 0
1
f ( n)
N
N 1
F ( p)W
p 0
np
INTRODUCTION - 3
The following applies

F 0
F 1
M
W0
W0
...
W 0 W1
...
M
...
F N 1
W 0 W N 1 W 2( N 1)
W 0 f (0)
W N 1 f (1)
M
M

( N 1)2
f ( N 1)
W
where the NxN matrix for W contains complex

numbers, but these numbers can be computed off-line
and stored
For each F there are made
complex multiplications
N-1 complex additions
N
And there are N values of F, so

complex multiplications
N(N-1) complex additions
N2
INTRODUCTION - 4
Although the first row and the first column are 1,

meaning that some effort is saved, still
1
complex multiply uses 4 real multiplies and 2 real

additions
1 complex addition uses 2 real additions
It results almost
4N2
real multiplies
2N2 + 2N(N-1) real additions
So the computational effort is proportional to N2

For a DFT (Discreet Fourier Transform) of a
sequence of 1024, a million operations are needed
FAST FOURIER TRANSFORM - 1
Lets take the basic DFT and split the summation

in 2 parts, one for even n and one for odd n
F p
N / 2 1
N /2 1
f (2n)e
j 2
p
2n
N
n 0
f (2n)e
j 2
p
n
N /2
j 2
p N /2 1
N
n 0
where
Ap
Bp
N / 2 1
f (2n)e
j 2
p
n
N /2
n0
N / 2 1
f (2n 1)e
n0
W e
p
f (2n 1)e
j 2
p
(2 n 1)
N
n 0
n0
N /2 1
j 2
p
N
j 2
p
n
N /2
f (2n 1)e
j 2
p
n
N /2
Ap W p B p
Ap and Bp are themselves DFTs, each of length N/2
Ap is the DFT of the sequence f(2n)={f(0), f(2), , f(N-4), f(N-2)}
Bp is the DFT of the sequence f(2n+1)={f(1), f(3), , f(N-3), f(N-1)}
Now lets consider again the split summation and to

evaluate at frequency p+N/2
F p N / 2
But
e
e
N /2 1
f (2n)e
j 2
p N / 2
n
N /2
j 2
p N /2 N /2 1
N
n0
j 2
p N /2
n
N /2
j 2
p N /2
N
n0
e
e
j 2
j 2
p
N /2
n j 2
n
N /2
N /2
p
N
j 2
N /2
N
j 2
j 2
p
N
p
n
N /2
f (2n 1)e
j 2
p N /2
n
N /2
So the simplified form is

F p N / 2
N /2 1
f (2n)e
j 2
p
n
N /2
j 2
n 0
p N /2 1
N
f (2n 1)e
j 2
p
n
N /2
n 0
Ap W p B p , where Ap , B p ,W p were defined before
Lets compare the 2 results

F p Ap W p B p
F p N / 2 Ap W p B p
So the following FFT butterfly structure may be

used
The terms Ap and Bp need to be computed only for

p=0,1,.., N/2-1 since F(p+N/2) has been expressed
in terms of Ap and Bp also
If computational effort is calculated again
One
Ap requires N/2 complex multiplies and N/2-1

complex additions; the same for one Bp
So for all N/2 Aps and Bps a number of 2(N/2)2
complex multiplies and 2(N/2)2 complex additions is
needed
N/2 complex multiplies are needed for all WpBp and N
complex additions for Ap+WpBp and Ap-WpBp
So the total number of complex multiplies is N2/2

+ N/2, and for complex additions N2/2 + N
To better understand, lets consider N=8

Ap
Bp
N /2 1
f (2n)e
j 2
p
n
N /2
n0
N /2 1
n0
f (2n 1)e
j 2
p
n
N /2

Ap p WNp/ 2 p , Ap N /4 p WNp/ 2 p
B p p' WNp/ 2 p' , B p N /4 p' WNp/ 2 p'
p
N /2
j 2
p
N /2
j 2
2p
N
WN2 p
But where is the magic?

Still only equations
Lets consider an example of N=32

N is small enough to be able to draw and to
compute, but
N is large enough to understand the patterns, the
computation rules and the procedures
GRAPHICAL REPRESENTATION - 1
ADDRESS GENERATOR -1
Index
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Revers
e
Order
0
16
8
24
4
20
12
28
2
18
10
26
6
22
14
30
1
17
9
25
5
21
13
29
3
19
11
27
7
23
15
31
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
Out
In
Out
In
Out
In
Out
In
Out
In
Out
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0
2
1
3
4
6
5
7
8
10
9
11
12
14
13
15
16
18
17
19
20
22
21
23
24
26
25
27
28
30
29
31
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0
4
2
6
1
5
3
7
8
12
10
14
9
13
11
15
16
20
18
22
17
21
19
23
24
28
26
30
25
29
27
31
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0
8
2
10
4
12
6
14
1
9
3
11
5
13
7
15
16
24
18
26
20
28
22
30
17
25
19
27
21
29
23
31
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0
16
2
18
4
20
6
22
8
24
10
26
12
28
14
30
1
17
3
19
5
21
7
23
9
25
11
27
13
29
15
31
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
ADDRESS GENERATOR -2
Input
Index of time-domain samples
Reverse order bit
01100 = 12
Stage 1
Index of output 00110 = 6

Index to write
00101 = 5
Stage 2

Index to write
00011 = 3
Stage 3
0 0 0 0 0
00110 = 6b1 b2 b3 b4 b5
b50b40b30b20b10
b11b21b31b41b51
b12b22b32b42b52
b12b22b52b42b32
b13b23b33b43b53

Index to write
00111 = 7
Stage 4
b14b24b34b44b54
b54b24b34b44b14
Stage 5
b15b25b35b45b55
b55b15b25b35b45

Index to write
11100 = 28

Index of frequency-domain samples 10110 = 22
ROTATION FACTORS GENERATOR - 1
For a N point FFT, N/2 rotation factors are needed for each
stage
But not all are distinct
ROTATION FACTORS GENERATOR - 2
So the all N/2 distinct rotation factors are pre-computed

and stored
Then, at each stage, an address generator will provide the
read address
At stage 1 the only generated address is 0
At stage 2 the only generated addresses are 0 and 8
At stage 3 the only generated addresses are 0, 4, 8 and 12
At stage 4 the only generated addresses are 0, 2, 4, 6, 8, 10,

12 and 14
At stage 4 the generated addresses are 0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15
IMPLEMENTATION - 1
IMPLEMENTATION - 2
Address Generator
Will generate
The reverse order write address for the input before Stage 1
The rotated write address after each Stage
The normal read address before each Stage
The jumped read address for Rotation Factors
The shifted write address after the last Staged for the output
Butterfly
Will compute
The complex multiplication

The 2 complex additions
Mem Data
Mem Rotation Factors
Stores the initial data and all the intermediate data after each Stage
Stores the off-line computed Rotation factors
Control
Knows when a new FFT starts

Knows what Stage is ongoing
Knows the position is Stage
MATLAB - 1
function [Xor, Xoi]=butterfly(Xir, Xii, Coef)
Xor=Xir + Coef*Xii;
Xoi=Xir - Coef*Xii;
%%%% For Implementation
% method 1 1: 1 complex multiply = 4 real multiplies and 2 real additions
% (a+jb)(c+jd) are
% real = ac - bd
% imag = ad + bc
% method 2: 1 complex multiply = 3 real multiplies and 5 real additions
% (a+jb)(c+jd) are
% real = (c-d)b + c(a-b)
% imag = (c+d)a - c(a-b)
end
MATLAB - 2
clc, clear all, close all
N=32;
Nr_st=log2(N);
in_x=sin(2*pi*1000/8000*[0:N-1]) + sin(2*pi*500/8000*[0:N-1]);
% reference zone
figure,stem(in_x)
X=fft(in_x,N);
f=linspace(-0.5,0.5, N);
figure(2),subplot(2,1,1),stem(f,fftshift(abs(X)))
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% initialization zone - the reverse order write address
index_init=dec2bin(0:N-1,Nr_st);
index_rob=index_init(:,end:-1:1);
index_ro=bin2dec(index_rob);
for k = 1 : N
in_x_rot(k) = in_x(index_ro(k)+1);
end
x_out_inter = zeros(N,1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% rotation factors matrix
for t = 1 : N/2
Coef(t) = exp(-j*2*pi/N * (t-1));
end
MATLAB - 3
for stage = 1 : Nr_st
for t = 1 : N/2
%read address generation for rotation factors - the "jumped" one
addrw=mod( (t-1)* (N/2 / (2^(stage-1) )) +1 ,16);
if addrw == 0
addrw = 16;
end
%butterfly usage
[x_out_inter(2*t-1),x_out_inter(2*t)]=butterfly(in_x_rot(2*t-1),in_x_rot(2*t), Coef(addrw));
end
% write data address generation - the "rotated" one, and after Stage 5
% the "shifted" one
index=index_init;
switch stage
case 1
index_scriere=[index(:,1:3) , index(:,5), index(:,4)];
case 2
index_scriere=[index(:,1:2) , index(:,5), index(:,4), index(:,3)];
case 3
index_scriere=[index(:,1) , index(:,5), index(:,3: 4), index(:,2)];
case 4
index_scriere=[index(:,5), index(:,2: 4), index(:,1)];
case 5
index_scriere=[index(:,2: 5), index(:,1)];
end
index_rot=bin2dec(index_scriere);
% write data in memory for the next stage

for k = 1 : N
in_x_rot(k) = x_out_inter(index_rot(k)+1);
end
end
out_x=in_x_rot;
figure(2),subplot(2,1,2),stem(f,fftshift(abs(out_x)),'r')
MATLAB - 4
32 samples of input signal
2
1.5
1
0.5
0
-0.5
-1
-1.5
-2
FFT in Matlab - reference
20
0
10
15
20
25
30
35
15
10
5
0
-0.5
-0.4
-0.3
-0.2
-0.4
-0.3
-0.2
20
-0.1
0
0.1
0.2
normalized frequency
Radix 2 implementation
0.3
0.4
0.5
0.3
0.4
0.5
15
10
5
0
-0.5
-0.1
0
0.1
0.2
EXTRA STUFF RADIX 4 DIT

ALGORITHM
RADIX 4 DIT ALGORITHM - 1
If N=4p then a Radix 4 DIT algorithm can be

used (using the same method as for Radix 2, the
complexity can be computed)
N 1
F ( p ) f (n)WNnp
n 0
N /4 1
n 0
N /4 1
n 0
N /4 1
n 0
N /4 1
4 np
N
4 np
N
np
N /4
f (4n)W
f (4n)W
f (4n)W
(4 n 1) p
N
f (4n 1)W
n0
p
N
p
N
N /4 1
f (4n 1)W
N /4 1
n 0
f (4n 2)W
4 np
N
n 0
f (4n 1)W
np
N /4
N /4 1
n 0
W
W
2p
N
2p
N
N / 4 1
(4 n 2) p
N
f (4n 2)W
n 0
N / 4 1
f (4n 2)W
n 0
P ( p ) WNpQ ( p ) WN2 p R ( p ) WN3 p S ( p ), p 0,1,..., N 1
N /4 1
f (4n 3)WN(4 n 3) p
3p
N
n 0
4 np
N
np
N /4
3p
N
N / 4 1
n 0
N /4 1
n 0
f (4n 3)WN4 np
f (4n 3)WNnp/ 4

P(p), Q(p), R(p) and S(p) are each N/4-point DFT
Although k=0,1,N-1, each sum can be computed
only over k=0,1,N/4-1 since they are periodic
with N/4 period
The transform F(p) can be broken into 4 parts as
below
p
2p
3p
F ( p)
P( p) WN Q( p ) WN R( p ) WN S ( p )
F ( p N / 4) P ( p ) WNp N / 4Q( p) WN2( p N /4) R ( p) WN3( p N /4) S ( p )

P ( p ) jWNp Q ( p ) WN2 p R ( p ) jWN3 p S ( p )
F ( p 2 N / 4) P ( p ) WNp 2 N /4Q( p ) WN2( p 2 N /4) R ( p ) WN3( p 2 N /4) S ( p)
P ( p ) WNp Q ( p ) WN2 p R( p) WN3 p S ( p )
F ( p 3N / 4) P( p) WNp 3 N / 4Q( p) WN2( p 3 N /4) R( p) WN3( p 3 N /4) S ( p)
P ( p ) jWNpQ ( p ) WN2 p R ( p ) jWN3 p S ( p )
k 0,1,...N / 4 1
2
p)
N
2
Wc exp( j
2 p ), k 0,1,...N / 4 1
N
2
Wd exp( j
3 p)
N
Wb exp( j
Example for N=16=42

Stage 0
f(0)
f(4)
f(8)
f(12)
f(1)
Stage 1
f(5)
f(9)
f(13)
f(2)
f(6)
f(10)
f(14)
f(3)
f(7)
f(11)
f(15)
F(0)
F(1)
F(2)
F(3)
F(4)
F(5)
F(6)
F(7)
F(8)
F(9)
F(10)
F(11)
F(12)
F(13)
F(14)
F(15)
ADDRESS GENERATOR - 1
ADDRESS GENERATOR - 2
Input
Index of time-domain samples
0110 = 1 2 = 6
Reverse order digit (inverse all digits)
1001 = 2 1 = 9
Stage 1
Index of output
Index to write
1110 = 3 2 = 14
(inverse last 2 digits)
1011 = 2 3 = 11
Stage 2
Index of output
0010 = 0 2 = 2
Index of frequency-domain samples (split outputs) 0100 = 2 0 = 4
Try the same for N=64! It will make sense.
MATLAB -1
function [Ao, Bo, Co, Do]=butterfly_4(Ai, Bi, Ci, Di, Coef2, Coef3, Coef4)
Ao = Ai + Coef2*Bi + Coef3*Ci + Coef4*Di;
Bo = Ai -j*Coef2*Bi - Coef3*Ci + j*Coef4*Di;
Co = Ai - Coef2*Bi + Coef3*Ci - Coef4*Di;
Do = Ai +j*Coef2*Bi - Coef3*Ci - j*Coef4*Di;
end
%%%%%%%%%%%%%%%%
MAIN
clc, clear all, close all

N=16;
Nr_st=log2(N)/2;
in_x=sin(2*pi*1000/8000*[0:N-1]) + sin(2*pi*500/8000*[0:N-1]);
% reference zone
figure,stem(in_x), title('32 samples of input signal')
X=fft(in_x,N);
f=linspace(-0.5,0.5, N);
figure(2),subplot(2,1,1),stem(f,fftshift(abs(X))), title(' FFT in Matlab - reference'), xlabel('normalized frequency')
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% initialization zone - the digit reverse order write address
index_init=dec2bin(0:N-1,Nr_st);
index_rob=[index_init(:,3:4),index_init(:,1:2)];
index_ro=bin2dec(index_rob);
for k = 1 : N
in_x_rot(k) = in_x(index_ro(k)+1);
end
MATLAB -2
x_out_inter = zeros(N,1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% rotation factors matrix
for t = 1 : N
Coef(t) = exp(-j*2*pi/N * (t-1));
end
%adress for coef
addr_coef_ind=[ 0 0 0 0
0 1 2 3 ];
for stage = 1 : Nr_st
for t = 1 : N/4
addr = addr_coef_ind(stage, t );
%butterfly usage
[x_out_inter(4*t-3),x_out_inter(4*t-2),x_out_inter(4*t-1),x_out_inter(4*t)] = ...
butterfly_4(in_x_rot(4*t-3),in_x_rot(4*t-2),in_x_rot(4*t-1),in_x_rot(4*t), Coef(addr*1 + 1), Coef(addr*2 + 1), Coef(addr*3+ 1));
switch stage
case 1
index_scriere = [index_init(:,3:4),index_init(:,1:2)];
case 2
index_scriere = [index_init(:,3:4),index_init(:,1:2)];
end
index_rot=bin2dec(index_scriere);
end
% write data in memory for the next stage
for k = 1 : N
in_x_rot(k) = x_out_inter(index_rot(k)+1);
end
end
out_x=in_x_rot;
figure(2),subplot(2,1,2),stem(f,fftshift(abs(out_x)),'r'),title(' Radix 4 implementation'), xlabel('normalized frequency')
MATLAB -3
16 samples of input signal
2
1.5
1
0.5
0
-0.5
FFT in Matlab - reference
10
-1
-1.5
-2
10
12
14
16
0
-0.5
-0.4
-0.3
-0.2
-0.4
-0.3
-0.2
10
-0.1
0
0.1
0.2
Radix 4 implementation
0.3
0.4
0.5
0.3
0.4
0.5
0
-0.5
-0.1
0
0.1
0.2

Fast Fourier Transform

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fast Fourier Transform

Uploaded by

Copyright:

Available Formats

FAST FOURIER

The magic behind the equations

Discreet Time Fourier Transform

Discreet Fourier Transform

Lets evaluate the computational effort

The following applies

where the NxN matrix for W contains complex

And there are N values of F, so

Although the first row and the first column are 1,

complex multiply uses 4 real multiplies and 2 real

So the computational effort is proportional to N2

FAST FOURIER TRANSFORM - 1

Lets take the basic DFT and split the summation

FAST FOURIER TRANSFORM - 2

Ap and Bp are themselves DFTs, each of length N/2

Ap is the DFT of the sequence f(2n)={f(0), f(2), , f(N-4), f(N-2)}

Bp is the DFT of the sequence f(2n+1)={f(1), f(3), , f(N-3), f(N-1)}

Now lets consider again the split summation and to

FAST FOURIER TRANSFORM - 3

So the simplified form is

Ap W p B p , where Ap , B p ,W p were defined before

Lets compare the 2 results

So the following FFT butterfly structure may be

FAST FOURIER TRANSFORM - 4

The terms Ap and Bp need to be computed only for

Ap requires N/2 complex multiplies and N/2-1

So the total number of complex multiplies is N2/2

FAST FOURIER TRANSFORM - 5

To better understand, lets consider N=8

FAST FOURIER TRANSFORM - 6

But where is the magic?

Lets consider an example of N=32

Index of output 00110 = 6

Index of output 00110 = 6

Index of output 01110 = 14

Index of output 01101 = 13

Index of output 01101 = 13

ROTATION FACTORS GENERATOR - 1

ROTATION FACTORS GENERATOR - 2

So the all N/2 distinct rotation factors are pre-computed

At stage 1 the only generated address is 0

At stage 2 the only generated addresses are 0 and 8

At stage 3 the only generated addresses are 0, 4, 8 and 12

At stage 4 the only generated addresses are 0, 2, 4, 6, 8, 10,

The complex multiplication

Mem Rotation Factors

Knows when a new FFT starts

% write data in memory for the next stage

FFT in Matlab - reference

EXTRA STUFF RADIX 4 DIT

RADIX 4 DIT ALGORITHM - 1

If N=4p then a Radix 4 DIT algorithm can be

P ( p ) WNpQ ( p ) WN2 p R ( p ) WN3 p S ( p ), p 0,1,..., N 1

RADIX 4 DIT ALGORITHM - 2

F ( p N / 4) P ( p ) WNp N / 4Q( p) WN2( p N /4) R ( p) WN3( p N /4) S ( p )

RADIX 4 DIT ALGORITHM - 3

RADIX 4 DIT ALGORITHM - 4

Example for N=16=42

Reverse order digit (inverse all digits)

Try the same for N=64! It will make sense.

clc, clear all, close all

FFT in Matlab - reference

You might also like