The Design of FIR Filter Base On Improved DA Algorithm and Its FPGA Implementation

589
The Design of FIR Filter Base on Improved DA Algorithm and its FPGA
Implementation

Shunwen Xiao, Yajun Chen, Peng Luo
College of Physics and Electronic information
China West Normal University,CWNU
Nanchong, Sichuan, 637002, P. R. China
xiaoshw@163.com

AbstractWhen the DA (distributed arithmetic) algorithm is
directly applied in FPGA (field programmable gate array) to
realize FIR (finite impulse response) filter, it is difficult to
achieve the best configuration in the coefficient of FIR filter,
the storage resource and the computing speed. According to
this problem, the paper provides the detailed analysis and
discussion in the algorithm, the memory size and the look-up
table speed. Also, the corresponding optimization and
improvement measures are discussed and the concrete
hardware realization of the circuit is presented. The design
based on Altera EP2C5T144C8 chips is synthesized under the
integrated environment of QUARTUS7.1. The results of
simulation and test show that this method greatly reduces the
FPGA hardware resource and the high speed filtering is
achieved. The design has a big breakthrough compared to the
traditional FPGA realization.

Keywords- FIR filter; DA algorithm; FPGA
I. INTRODUCTION
The DA algorithm was initially proposed by Crosier in
1973. It attracted attention again after the FPGA LUT
(Look-up Table) was invented by Minx in the early 90s of
last century and effectively applied in the design of FIR
filters [1]. The principle of DA algorithm is as follows [1].
The output of linear time-invariant system is shown as
Eq. (1).

1
M
m m
m
Y A X
=
=
(1)
Where A
m
is a fixed factor, X
m
is the input data
m
X <1). X
m
can be expressed as Eq. (2) using the binary
complement.
1
0
1
2
N
n
m m mn
n
X x x
=
= +
(2)
Where x
mn
is 0 or 1, x
m0
is sign bit, x
m, N-1
is the least
significant bit. Then Y can be expressed as Eq. (3).
1
0
1 1
( 2 )
M N
n
m mn m
m n
Y A x x
= =
=

1
0
1 1 1
2 ( )
N M M
n
m mn m m
n m m
A x A x
= = =
= +

(3)
In Eq. (3), as the value of x
mn
is 0 or 1, there are 2
M

kinds of different results of
=
M
m
mn m
x A
1
. If we construct a
LUT which can store all the possible combination of values
[2], we can calculate the value of 2
M
in advance and store
them in the LUT. Using x
mn
as the LUT address signal, the
shifting (2
-1
operation) and adding operation are carried out
on the output of the LUT. Then
=
M
m
mn m
x A
1
can be realized
through N-1 cycles and the result of multiplication-
accumulation can be achieved directly. So the complicated
multiplication-accumulation operation is converted to the
shifting and adding operation. The parallel computing is
adopted to improve the speed of calculation.
The complicated multiplication-accumulation operation
is converted to the shifting and adding operation when the
DA algorithm is directly applied to realize linear time-
invariant system. However, the scale of the LUT will
increase exponentially with the coefficient. If the coefficient
is small, it is very convenient to realize through the rich
structure of FPGA LUT; while the coefficient is large, it
will take up a lot of storage resources of FPGA and reduce
the calculation speed. Meanwhile, the N-1 cycles also result
in the too long LUT time and the low computing speed.
The paper presents the improvement and optimization of
the DA algorithm aiming at the problems of the
configuration in the coefficient of FIR filter, the storage
resource and the calculating speed, which make the memory
size smaller and the operation speed faster to improve the
computational performance.

II. IMPROVED DESIGN OF THE DA ALGORITHM
From Eq. (2), Xm can be expressed as Eq. (4).
1
[ ( )]
2
m m m
X X X = (4)
Where the X
m
can be expressed as Eq. (5) according
to the binary complement operation [3].
Volumn 2
978-1-4244-5586-7/10/$26.00 2010 IEEE C
590
1
( 1)
0
1
2 2
N
n N
m mn
m
n
X x x

=
= + +
(5)
Put Eq. (5) and Eq. (2) into Eq. (4), Eq. (6) can be
achieved.
1
( 1)
0
0
1
1
[ ( ) ( )2 2 ]
2
N
n N
m mn
m m mn
n
X x x x x

=
= +
(6)
For convenience, two variables are defined as follows:
0
0 0
( ) m
m m
x x =
( ) mn
mn mn
x x =
In which, as the value of
mn
x is 0 or 1, so the value of
mn
and
0 m
is 1. Then Eq. (6) can be expressed as Eq.
(7).
1
( 1)
1
1
[ 2 2 ]
2
N
n N
m mn
n
X

=
=
(7)
Put Eq. (7) into Eq. (1), Eq. (8) can be achieved.
1
( 1)
1 1
1
[ 2 2 ]
2
K N
n N
k mn
k n
Y A

= =
=

1
( 1)
1 1 1
1
[ 2 2 ]
2
N M M
n N
m mn m
n m m
A A

= = =
=

(8)
As there are 2
M
different kinds of results of
=
M
m
mn m
A
1
and the value of
mn
is 1, so the results
show positive and negative symmetry property. If the
positive and negative sign are not considered, there are only
2
M-1
different kind of results and the size of storage will
reduce by half.
Through the algorithm optimization, Eq. (8) can be
simplified as Eq. (9).
1 / 2 / 2
( 1)
1 1 1
1
[ 2 2 ]
2
N M M
n N
m mn m
n m m
Y A x A

= = =
=

(9)
Although the storage size reduces by half through the
algorithm optimization, but the scale of the LUT will
increase exponentially with the m. So the scale of the
LUT can be further reduced by decreasing the size of m
[4-5]. Then the
mn
M
m
m
x A
=
2 /
1
in Eq. (9) can be defined as Eq.
(10).
/ 2
1 1 1
M a b z
m mn m mn m mn m m
m m m a m y
A x A x A x A x
= = = + =
= + + +

"
(10)
In which, z), )|, |u!u!, so an inner-
product operation with the scale of 2
M
will be realized
through several LUTs with different or same depth and
adders. The scale of the memory is 2
a
2
b-a
2
z-y
2
M/2-z
.
For example, if using two LUTs with depth of M/4 and
adders to achieve it, namely,

/ 2 / 4 / 2
1 1 / 4 1
M M M
m mn m mn m mn
m m m M
A x A x A x
= = = +
= +

(11)
Then the size of memory is 2
M/4
2
M/4
2
M/4+1
. Compared
with the memory size which is 2
M-1
before optimizing, its
memory scale is only 2
-3M/4+2
times of the original.
Through the algorithm improvement, the hardware
resource is reduced and the operation speed is improved.
The simplified hardware circuit structure is shown in Fig. 1.

Fig.1 The circuit structure through the algorithm improvement

III. THE CIRCUIT DESIGN OF FIR FILTER
A. The design index and parameters extraction
A 16th-order FIR filter is designed. Its Parameters are as
follows: the sampling frequency is 2.25 MHz; the passband
cutoff frequency is 100 kHz; the width of the input data, the
output data and the filter coefficient is 8, 16 and 20 bits
respectively. It adopts Hamming window to design and
MatLab simulation to calculate its unit-sampling response h
(k) and amplify it 2
16
times. The h (k) is as follows.
h0)h!)?98D h!)h!+)8D
h?)h!3)!3b+D h3)h!?)?!8D
h+)h!!)+03D h)h!0)b+00D
hb)h9)99bD h)h8)8908D
B. The hardware circuit unit
The hardware circuit is shown in Fig. 2 [6-8].

Fig. 2 The circuit structure of FIR system

Volumn 2
591
When using the DA algorithm to implement the linear
time-invariant system, the algorithm is optimized according
the method of section 2. The prestoring value corresponding
to the upper half of the memory address of LUT storage will
be the negative of the lower half and then the LUT reduces
by half using symmetry. The address maker circuit
generates the LUT address. The upper half of the address
looks up its corresponding prestoring value. Meanwhile, the
address is used as Ctrl control-adding-decrease implement
to complete the positive and negative conversion between
the prestoring value corresponding to the upper and lower
half of it. According to result of the improvement and
optimization, the LUT is divided into two 4-input LUTs and
the address maker circuit divides the input signals into four
segments in accordance with the 4-input LUT. The speed of
signal sampling under the control of the FPGA can be
adjusted. The data buffer can be established according to the
order of the filter. As the designed filter is a 16th-order one,
so the sampled serial data can be sent to the 20 bits serial-in-
parallel-out shift register, and then the data is divided and
sent to the LUT in turn. As the coefficient is amplified 2
16

times, the obtained result is reduced by the output circuit
accordingly.
C Circuit simulation and test
The design based on Altheas EP2C5T144C8 chips, the
series of Cyclone, is synthesized under the integrated
environment of QUARTUS 7.1. The input sequence is
x(n)=[0, 3, 1, 1, 0, 2, 1, 4, 3, 2, 2, 0, 1, 2, 2, 3] and the
simulation waveform is shown in Fig. 3. The filter
input/output in the waveform uses hexadecimal
representation. Compared to MatLab simulation, the error of
hardware simulation is 1%.The designed results are
consistent with what we desired. The circuit structure of FIR
filter works well.

Fig. 3 The simulation waveform

The implementation of filter based on FPGA is realized
on EP2C5T144C8 chips by using of IP core, the DA
algorithm and the improved DA algorithm separately. The
results of compilation and test show that the needed LE is
1522, 1269 and 776 respectively when IP core, the DA
algorithm and the improved DA algorithm is used to
implement filter. The improved algorithm can greatly
reduce the hardware resource and improve the throughput
efficiency. It meets the design requirements entirely.

IV. CONCLUSIONS
The complicated multiplication-accumulation operation
is converted to the shifting and adding operation when the
DA algorithm is directly applied to realize FIR filter.
Aiming at the problems of the best configuration in the
coefficient of FIR filter, the storage resource and the
calculating speed, the DA algorithm is optimized and
improved in the algorithm structure, the memory size and
the look-up table speed. The arithmetic expression has clear
layers of derivation process and the circuit structure is
reasonable, which make the memory size smaller and the
operation speed faster. The design improves greatly
compared to the conventional FPGA realization and it can
be flexibility applied to implement high-pass, low-pass and
band-stop filters by changing the order and the LUT
coefficient.

REFERENCES
[1] L. Zhao, W. H. Bi, F. Liu, Design of digital FIR bandpass filter using
distributed algorithm based on FPGA, Electronic Measurement
Technology, 2007,vol. 30, pp.101-104.
[2] H. Chen, C. H. Xiong, S. N. Zhong, FPGA-based efficient
programmable polyphase FIR filter, Journal of Beijing lnsititute of
Technology, 2005, vol. 14, pp. 4-8.
[3] Y. T. Xu, C. G. Wang, J. L. Wang, Hardware Implementation of FIR
Filter Based on DA Algorithm, Journal of PLA University of
Science and Technology, 2003, vol. 4, pp. 22-25.
[4] D. Wu, Y. H. Wang, H. Z. Lu, Distributed Arithmetic and its
Implementation in FPGA, Journal of National University of Defense
Technology, 2000, vol. 22, pp.16-19.
[5] L. Wei, R. J. Yang, X. T. Cui, Design of FIR filter based on
distributed arithmetic and its FPGA implementation, Chinese
Journal of Scientific Instrument, 2008, vol. 29, pp. 2100-2104.
[6] W. Zhu, G. M. Zhang, Z. M. Zhang, Design of FIR Filter Based on
Distributed Algorithm with Parallel Structure, Journal of Electronic
Measurement and Instrument, 2007, vol. 21, pp. 87-92.
[7] W. Wang, M. N. S. Swamy, M. O. Ahmad, Novel Design and FPGA
Implemention of DA-RNS FIR Filters, Journal of Circuits Systems
and Computers, 2004, vol. 13, pp. 1233-1249.
[8] P. Girard, O. Hron, S. Pravossoudovitch, and M. Renovell, Delay
Fault Testing of Look-Up Tables in SRAM-Based FPGAs, Journal
of Electronic Testing, 2005, vol. 21, pp. 43-55.
Volumn 2

The Design of FIR Filter Base On Improved DA Algorithm and Its FPGA Implementation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Design of FIR Filter Base On Improved DA Algorithm and Its FPGA Implementation

Uploaded by

Copyright:

Available Formats

589

You might also like