You are on page 1of 37

FAST FOURIER

TRANSFORM

The magic behind the equations

INTRODUCTION - 1

Fourier Transform
F

f (t )e jt dt

1
f (t )
2

F e

j t

Discreet Time Fourier Transform


N 1

F f (n)e jn , n 0..N 1
n0

Discreet Fourier Transform


N 1

F p f ( n )e
n0

j 2

p
n
M

, n 0..N 1, p 0..M 1

INTRODUCTION - 2

When N=M
N 1

F p f ( n )e

j 2

p
n
N

, n, p 0..N 1

n 0

1
f ( n)
N

N 1

F ( p )e

j 2

p
n
N

p 0

Lets evaluate the computational effort


W e

2
N

N 1

F p f (n)W np , n, p 0..N 1
n 0

1
f ( n)
N

N 1

F ( p)W
p 0

np

INTRODUCTION - 3

The following applies


F 0

F 1
M

W0

W0

...

W 0 W1
...

M
...

F N 1
W 0 W N 1 W 2( N 1)

W 0 f (0)

W N 1 f (1)

M
M

( N 1)2
f ( N 1)
W

where the NxN matrix for W contains complex


numbers, but these numbers can be computed off-line
and stored
For each F there are made

complex multiplications
N-1 complex additions
N

And there are N values of F, so


complex multiplications
N(N-1) complex additions
N2

INTRODUCTION - 4

Although the first row and the first column are 1,


meaning that some effort is saved, still
1

complex multiply uses 4 real multiplies and 2 real


additions
1 complex addition uses 2 real additions

It results almost
4N2

real multiplies
2N2 + 2N(N-1) real additions

So the computational effort is proportional to N2


For a DFT (Discreet Fourier Transform) of a
sequence of 1024, a million operations are needed

FAST FOURIER TRANSFORM - 1

Lets take the basic DFT and split the summation


in 2 parts, one for even n and one for odd n
F p
N / 2 1

N /2 1

f (2n)e

j 2

p
2n
N

n 0

f (2n)e

j 2

p
n
N /2

j 2

p N /2 1
N

n 0

where
Ap
Bp

N / 2 1

f (2n)e

j 2

p
n
N /2

n0

N / 2 1

f (2n 1)e

n0

W e
p

f (2n 1)e

j 2

p
(2 n 1)
N

n 0

n0

N /2 1

j 2

p
N

j 2

p
n
N /2

f (2n 1)e

j 2

p
n
N /2

Ap W p B p

FAST FOURIER TRANSFORM - 2

Ap and Bp are themselves DFTs, each of length N/2

Ap is the DFT of the sequence f(2n)={f(0), f(2), , f(N-4), f(N-2)}

Bp is the DFT of the sequence f(2n+1)={f(1), f(3), , f(N-3), f(N-1)}

Now lets consider again the split summation and to


evaluate at frequency p+N/2
F p N / 2

But

e
e

N /2 1

f (2n)e

j 2

p N / 2
n
N /2

j 2

p N /2 N /2 1
N

n0

j 2

p N /2
n
N /2

j 2

p N /2
N

n0

e
e

j 2

j 2

p
N /2
n j 2
n
N /2
N /2

p
N

j 2

N /2
N

j 2

j 2

p
N

p
n
N /2

f (2n 1)e

j 2

p N /2
n
N /2

FAST FOURIER TRANSFORM - 3

So the simplified form is


F p N / 2

N /2 1

f (2n)e

j 2

p
n
N /2

j 2

n 0

p N /2 1
N

f (2n 1)e

j 2

p
n
N /2

n 0

Ap W p B p , where Ap , B p ,W p were defined before

Lets compare the 2 results


F p Ap W p B p
F p N / 2 Ap W p B p

So the following FFT butterfly structure may be


used

FAST FOURIER TRANSFORM - 4

The terms Ap and Bp need to be computed only for


p=0,1,.., N/2-1 since F(p+N/2) has been expressed
in terms of Ap and Bp also
If computational effort is calculated again
One

Ap requires N/2 complex multiplies and N/2-1


complex additions; the same for one Bp
So for all N/2 Aps and Bps a number of 2(N/2)2
complex multiplies and 2(N/2)2 complex additions is
needed
N/2 complex multiplies are needed for all WpBp and N
complex additions for Ap+WpBp and Ap-WpBp

So the total number of complex multiplies is N2/2


+ N/2, and for complex additions N2/2 + N

FAST FOURIER TRANSFORM - 5

To better understand, lets consider N=8


Ap
Bp

N /2 1

f (2n)e

j 2

p
n
N /2

n0

N /2 1

n0

f (2n 1)e

j 2

p
n
N /2

FAST FOURIER TRANSFORM - 6


Ap p WNp/ 2 p , Ap N /4 p WNp/ 2 p
B p p' WNp/ 2 p' , B p N /4 p' WNp/ 2 p'
p
N /2

j 2

p
N /2

j 2

2p
N

WN2 p

But where is the magic?


Still only equations

Lets consider an example of N=32


N is small enough to be able to draw and to
compute, but
N is large enough to understand the patterns, the
computation rules and the procedures

GRAPHICAL REPRESENTATION - 1

GRAPHICAL REPRESENTATION - 2

GRAPHICAL REPRESENTATION - 3

ADDRESS GENERATOR -1
Index

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Revers
e
Order

0
16
8
24
4
20
12
28
2
18
10
26
6
22
14
30
1
17
9
25
5
21
13
29
3
19
11
27
7
23
15
31

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Out

In

Out

In

Out

In

Out

In

Out

In

Out

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

0
2
1
3
4
6
5
7
8
10
9
11
12
14
13
15
16
18
17
19
20
22
21
23
24
26
25
27
28
30
29
31

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

0
4
2
6
1
5
3
7
8
12
10
14
9
13
11
15
16
20
18
22
17
21
19
23
24
28
26
30
25
29
27
31

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

0
8
2
10
4
12
6
14
1
9
3
11
5
13
7
15
16
24
18
26
20
28
22
30
17
25
19
27
21
29
23
31

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

0
16
2
18
4
20
6
22
8
24
10
26
12
28
14
30
1
17
3
19
5
21
7
23
9
25
11
27
13
29
15
31

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

ADDRESS GENERATOR -2

Input
Index of time-domain samples
Reverse order bit

01100 = 12

Stage 1

Index of output 00110 = 6


Index to write
00101 = 5

Stage 2

Index of output 00110 = 6


Index to write
00011 = 3

Stage 3

0 0 0 0 0

00110 = 6b1 b2 b3 b4 b5

b50b40b30b20b10
b11b21b31b41b51

b12b22b32b42b52
b12b22b52b42b32
b13b23b33b43b53

Index of output 01110 = 14


Index to write
00111 = 7

Stage 4

b14b24b34b44b54
b54b24b34b44b14

Stage 5

b15b25b35b45b55
b55b15b25b35b45

Index of output 01101 = 13


Index to write
11100 = 28

Index of output 01101 = 13


Index of frequency-domain samples 10110 = 22

ROTATION FACTORS GENERATOR - 1

For a N point FFT, N/2 rotation factors are needed for each
stage
But not all are distinct

ROTATION FACTORS GENERATOR - 2

So the all N/2 distinct rotation factors are pre-computed


and stored
Then, at each stage, an address generator will provide the
read address

At stage 1 the only generated address is 0

At stage 2 the only generated addresses are 0 and 8

At stage 3 the only generated addresses are 0, 4, 8 and 12

At stage 4 the only generated addresses are 0, 2, 4, 6, 8, 10,


12 and 14
At stage 4 the generated addresses are 0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15

IMPLEMENTATION - 1

IMPLEMENTATION - 2

Address Generator

Will generate

The reverse order write address for the input before Stage 1
The rotated write address after each Stage
The normal read address before each Stage
The jumped read address for Rotation Factors
The shifted write address after the last Staged for the output

Butterfly

Will compute

The complex multiplication


The 2 complex additions

Mem Data

Mem Rotation Factors

Stores the initial data and all the intermediate data after each Stage
Stores the off-line computed Rotation factors

Control

Knows when a new FFT starts


Knows what Stage is ongoing
Knows the position is Stage

MATLAB - 1
function [Xor, Xoi]=butterfly(Xir, Xii, Coef)
Xor=Xir + Coef*Xii;
Xoi=Xir - Coef*Xii;
%%%% For Implementation
% method 1 1: 1 complex multiply = 4 real multiplies and 2 real additions
% (a+jb)(c+jd) are
% real = ac - bd
% imag = ad + bc
% method 2: 1 complex multiply = 3 real multiplies and 5 real additions
% (a+jb)(c+jd) are
% real = (c-d)b + c(a-b)
% imag = (c+d)a - c(a-b)
end

MATLAB - 2
clc, clear all, close all
N=32;
Nr_st=log2(N);
in_x=sin(2*pi*1000/8000*[0:N-1]) + sin(2*pi*500/8000*[0:N-1]);
% reference zone
figure,stem(in_x)
X=fft(in_x,N);
f=linspace(-0.5,0.5, N);
figure(2),subplot(2,1,1),stem(f,fftshift(abs(X)))
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% initialization zone - the reverse order write address
index_init=dec2bin(0:N-1,Nr_st);
index_rob=index_init(:,end:-1:1);
index_ro=bin2dec(index_rob);
for k = 1 : N
in_x_rot(k) = in_x(index_ro(k)+1);
end
x_out_inter = zeros(N,1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% rotation factors matrix
for t = 1 : N/2
Coef(t) = exp(-j*2*pi/N * (t-1));
end

MATLAB - 3
for stage = 1 : Nr_st
for t = 1 : N/2
%read address generation for rotation factors - the "jumped" one
addrw=mod( (t-1)* (N/2 / (2^(stage-1) )) +1 ,16);
if addrw == 0
addrw = 16;
end
%butterfly usage
[x_out_inter(2*t-1),x_out_inter(2*t)]=butterfly(in_x_rot(2*t-1),in_x_rot(2*t), Coef(addrw));
end
% write data address generation - the "rotated" one, and after Stage 5
% the "shifted" one
index=index_init;
switch stage
case 1
index_scriere=[index(:,1:3) , index(:,5), index(:,4)];
case 2
index_scriere=[index(:,1:2) , index(:,5), index(:,4), index(:,3)];
case 3
index_scriere=[index(:,1) , index(:,5), index(:,3: 4), index(:,2)];
case 4
index_scriere=[index(:,5), index(:,2: 4), index(:,1)];
case 5
index_scriere=[index(:,2: 5), index(:,1)];
end
index_rot=bin2dec(index_scriere);

% write data in memory for the next stage


for k = 1 : N
in_x_rot(k) = x_out_inter(index_rot(k)+1);
end
end
out_x=in_x_rot;
figure(2),subplot(2,1,2),stem(f,fftshift(abs(out_x)),'r')

MATLAB - 4
32 samples of input signal

2
1.5
1
0.5
0
-0.5
-1
-1.5
-2

FFT in Matlab - reference

20
0

10

15

20

25

30

35

15
10
5
0
-0.5

-0.4

-0.3

-0.2

-0.4

-0.3

-0.2

20

-0.1
0
0.1
0.2
normalized frequency
Radix 2 implementation

0.3

0.4

0.5

0.3

0.4

0.5

15
10
5
0
-0.5

-0.1
0
0.1
normalized frequency

0.2

EXTRA STUFF RADIX 4 DIT


ALGORITHM

RADIX 4 DIT ALGORITHM - 1

If N=4p then a Radix 4 DIT algorithm can be


used (using the same method as for Radix 2, the
complexity can be computed)
N 1

F ( p ) f (n)WNnp
n 0

N /4 1

n 0

N /4 1

n 0

N /4 1

n 0

N /4 1

4 np
N

4 np
N

np
N /4

f (4n)W
f (4n)W
f (4n)W

(4 n 1) p
N

f (4n 1)W

n0

p
N

p
N

N /4 1

f (4n 1)W

N /4 1

n 0

f (4n 2)W

4 np
N

n 0

f (4n 1)W

np
N /4

N /4 1
n 0

W
W

2p
N

2p
N

N / 4 1

(4 n 2) p
N

f (4n 2)W

n 0

N / 4 1

f (4n 2)W

n 0

P ( p ) WNpQ ( p ) WN2 p R ( p ) WN3 p S ( p ), p 0,1,..., N 1

N /4 1

f (4n 3)WN(4 n 3) p

3p
N

n 0

4 np
N

np
N /4

3p
N

N / 4 1

n 0

N /4 1

n 0

f (4n 3)WN4 np

f (4n 3)WNnp/ 4

RADIX 4 DIT ALGORITHM - 2


P(p), Q(p), R(p) and S(p) are each N/4-point DFT
Although k=0,1,N-1, each sum can be computed
only over k=0,1,N/4-1 since they are periodic
with N/4 period
The transform F(p) can be broken into 4 parts as
below
p
2p
3p

F ( p)

P( p) WN Q( p ) WN R( p ) WN S ( p )

F ( p N / 4) P ( p ) WNp N / 4Q( p) WN2( p N /4) R ( p) WN3( p N /4) S ( p )


P ( p ) jWNp Q ( p ) WN2 p R ( p ) jWN3 p S ( p )
F ( p 2 N / 4) P ( p ) WNp 2 N /4Q( p ) WN2( p 2 N /4) R ( p ) WN3( p 2 N /4) S ( p)
P ( p ) WNp Q ( p ) WN2 p R( p) WN3 p S ( p )
F ( p 3N / 4) P( p) WNp 3 N / 4Q( p) WN2( p 3 N /4) R( p) WN3( p 3 N /4) S ( p)
P ( p ) jWNpQ ( p ) WN2 p R ( p ) jWN3 p S ( p )
k 0,1,...N / 4 1

RADIX 4 DIT ALGORITHM - 3

2
p)
N
2
Wc exp( j
2 p ), k 0,1,...N / 4 1
N
2
Wd exp( j
3 p)
N
Wb exp( j

RADIX 4 DIT ALGORITHM - 4

Example for N=16=42


Stage 0

f(0)
f(4)
f(8)
f(12)
f(1)

Stage 1

f(5)
f(9)
f(13)
f(2)

f(6)

f(10)
f(14)
f(3)
f(7)
f(11)
f(15)

F(0)
F(1)
F(2)
F(3)
F(4)
F(5)
F(6)
F(7)
F(8)
F(9)
F(10)
F(11)
F(12)
F(13)
F(14)
F(15)

ADDRESS GENERATOR - 1

ADDRESS GENERATOR - 2

Input
Index of time-domain samples

0110 = 1 2 = 6

Reverse order digit (inverse all digits)

1001 = 2 1 = 9

Stage 1

Index of output
Index to write

1110 = 3 2 = 14
(inverse last 2 digits)
1011 = 2 3 = 11

Stage 2

Index of output
0010 = 0 2 = 2
Index of frequency-domain samples (split outputs) 0100 = 2 0 = 4

Try the same for N=64! It will make sense.

MATLAB -1
function [Ao, Bo, Co, Do]=butterfly_4(Ai, Bi, Ci, Di, Coef2, Coef3, Coef4)
Ao = Ai + Coef2*Bi + Coef3*Ci + Coef4*Di;
Bo = Ai -j*Coef2*Bi - Coef3*Ci + j*Coef4*Di;
Co = Ai - Coef2*Bi + Coef3*Ci - Coef4*Di;
Do = Ai +j*Coef2*Bi - Coef3*Ci - j*Coef4*Di;
end

%%%%%%%%%%%%%%%%

MAIN

clc, clear all, close all


N=16;
Nr_st=log2(N)/2;
in_x=sin(2*pi*1000/8000*[0:N-1]) + sin(2*pi*500/8000*[0:N-1]);
% reference zone
figure,stem(in_x), title('32 samples of input signal')
X=fft(in_x,N);
f=linspace(-0.5,0.5, N);
figure(2),subplot(2,1,1),stem(f,fftshift(abs(X))), title(' FFT in Matlab - reference'), xlabel('normalized frequency')
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% initialization zone - the digit reverse order write address
index_init=dec2bin(0:N-1,Nr_st);
index_rob=[index_init(:,3:4),index_init(:,1:2)];
index_ro=bin2dec(index_rob);
for k = 1 : N
in_x_rot(k) = in_x(index_ro(k)+1);
end

MATLAB -2
x_out_inter = zeros(N,1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% rotation factors matrix
for t = 1 : N
Coef(t) = exp(-j*2*pi/N * (t-1));
end
%adress for coef
addr_coef_ind=[ 0 0 0 0
0 1 2 3 ];
for stage = 1 : Nr_st
for t = 1 : N/4
addr = addr_coef_ind(stage, t );
%butterfly usage
[x_out_inter(4*t-3),x_out_inter(4*t-2),x_out_inter(4*t-1),x_out_inter(4*t)] = ...
butterfly_4(in_x_rot(4*t-3),in_x_rot(4*t-2),in_x_rot(4*t-1),in_x_rot(4*t), Coef(addr*1 + 1), Coef(addr*2 + 1), Coef(addr*3+ 1));

switch stage
case 1
index_scriere = [index_init(:,3:4),index_init(:,1:2)];
case 2
index_scriere = [index_init(:,3:4),index_init(:,1:2)];
end
index_rot=bin2dec(index_scriere);
end
% write data in memory for the next stage
for k = 1 : N
in_x_rot(k) = x_out_inter(index_rot(k)+1);
end
end
out_x=in_x_rot;
figure(2),subplot(2,1,2),stem(f,fftshift(abs(out_x)),'r'),title(' Radix 4 implementation'), xlabel('normalized frequency')

MATLAB -3
16 samples of input signal

2
1.5
1
0.5
0
-0.5

FFT in Matlab - reference

10

-1
-1.5
-2

10

12

14

16

0
-0.5

-0.4

-0.3

-0.2

-0.4

-0.3

-0.2

10

-0.1
0
0.1
0.2
normalized frequency
Radix 4 implementation

0.3

0.4

0.5

0.3

0.4

0.5

0
-0.5

-0.1
0
0.1
normalized frequency

0.2

You might also like