Professional Documents
Culture Documents
Abstract— Hardware realization of FIR filters that are based ai (0 ≤ i ≤ k − 1) are provided either as constants or dynamic
on residue number systems leads to increased speed and reduced variables. Each tap includes a multiplication and an addition
power, where besides the popular Mersenne numbers, several operation whose circuits can be considered as a pipeline stage.
moduli of the form 2n ± δ(δ ≥ 3) are commonly used. However,
additional weighted 2 i (i > 1) end-around carries (EACs) slow
k−1
down and complicate the required modular adders in comparison y (t) = ai x (t − i ) (1)
to modulo-(2n − 1) adders. For example, for δ = 3, the modular
sum is obtained via A + B ∓ 3cout , where A and B are modulo- i=0
(2n ± 3) operands and cout is the carryout of binary addition Introduction of residue number system (RNS) as a vehi-
A + B. In this paper, a new multioperand modular adder is
proposed, where the key improvement is that all the required cle for implementation of the aforementioned addition and
EAC additions (e.g., +3cout ) are postponed until after the last multiplication operations has been shown [3]–[14] to gain
filter tap, whereby tens of addition operations take place without advantages in terms of speed, area consumption and power
the EAC secondary addition; hence considerable savings of dissipation over the conventional binary FIR filter realizations.
time, area consumption, and power dissipation. The proposed An RNS is recognized by a set of often mutually prime
deferred EAC addition scheme has been applied to three previous
relevant works. The corresponding synthesis results showed moduli as its bases, so that a wide word multiplier or adder
over 11%–32%, 27%–29%, and 21%–37%, reductions in delay, is broken down into narrower RNS computation channels that
area, and power measures, respectively. This is achieved despite operate in parallel; hence better performance and commonly
area and power overhead of the few appended stages into the lower power dissipation that is believed to be due to broken
pipelined architecture of the filter, which are nevertheless shown carry chains. However the overhead of residue generation
to become less significant as the number of filter taps grows.
and the final RNS-to-binary conversions must be considered
Index Terms— FIR filter, digital signal processing, residue in evaluations. On the other hand, performance of an RNS
number system, modular adder. operation is dictated by the slowest computing channel that
generally corresponds to the smallest modulo. Therefore, it is
I. I NTRODUCTION desirable to set up a moduli set containing small modulus
that lead to balanced speed of the corresponding channels.
BELGHADR AND JABERIPUR: FIR FILTER REALIZATION VIA DEFERRED EAC MODULAR ADDITION 3
Fig. 4. Proposed New1 modulo-m j k-tap FIR filter using deferred end-around carry modular addition.
Fig. 5. FIR filter tap comparison, a) Proposed architecture used in Fig. 4, b) Conventional architecture used in Fig. 2 and [13].
modulo-m j addition, where m j = 2n ± δ j can be described (see Fig. 3), as is shown by the gray crossed circle in the
via (5) and (6) (with δ representing an arbitrary δ j ). These (k + 2)th tap of Fig. 4. To appreciate the simplicity of the new
equations are the elaborated editions of (3) and (4), where architecture (see Fig. 5a), referred to hereafter as “New1”, and
W = A + B + δ = wn wn−1 . . . w0 (wn+1 wn . . . w0 , in case ease of comparison, we provide Fig. 5b. This figure contains
2n + δ) represents the interim sum. the structure of each tap of Fig. 2 based on the adder of Fig. 3
(i.e., a general modular adder [26]). To evaluate the figures of
|W |2n i f wn = 1
|A + B|2n −δ = = wn−1 . . . w0 − δwn merit of [13] that was briefly described at the end of Section II,
W − δ i f wn = 0
as one of the reference works, we consider the latter modular
(5) adder as had been utilized in [13] due to lack of adequate
|A + B|2n +δ = |A + B|2n+1 −δ = wn . . . w0 − δ wn+1 (6) information therein.
Functionality of the proposed realization of RNS-FIR filter,
Direct realization of (5) (and likewise for (6)) requires the
regardless of its multipliers’ detailed architecture, can be also
costly −δwn operation per each tap. However, these subtrac-
demonstrated with a simple numerical example, as follows,
tions (on wn = 0 instances) can be avoided by accumulating
for k = 5. This is with the understanding that the number
the number of required subtractions in a register until after
of taps is commonly in order of tens for high frequency
the last tap, when the accumulated value times δ is subtracted
selectivity in real applications [2]. Therefore, the overhead of
(i.e., only one subtraction). More details of this endeavor is
our two extra stages becomes negligible.
depicted by Fig. 4 and Fig. 5a (bearing the details of one
Example 1 (5-Tap Case): Equation (7) describes an instance
tap of Fig. 4.) that is supported by the following additional
of (2), for k = 5.
explanations. To count the required number of −δ operations,
we use an incrementor per tap (i.e., the +1 boxes after each tap |a | × |x (5)|
0 mj mj
in Fig. 4). The incrementor within tap j receives the current mj
count and its clock is triggered if wn = 0 in tap j − 1 + |a1 |m j × |x (4)|m j
m j
(see also Fig. 5a). Let p denote the value of the total count
+ |a | × |x (3)|
(i.e. the output of the last incrementor in the (k + 1)th tap). |y (5)|m j = 2 mj mj (7)
m j
The correction term − pδ is obtained via an LUT that is
+ |a3 |m j × |x (2)|m j
preloaded with k integers |− pδ|2n −δ , where 0 ≤ p < k. m j
This is indicated by a multiplication box in the (k + 1)th tap + |a | × |x (1)|
4 mj mj
in Fig. 4, which is followed by a modulo-(2n − δ) adder m j mj
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
BELGHADR AND JABERIPUR: FIR FILTER REALIZATION VIA DEFERRED EAC MODULAR ADDITION 5
Fig. 6. Proposed New2 modulo-(28 − 1) k-tap FIR filter, based on RNS1 of [12] and using deferred end-around carry modular addition.
Fig. 7. FIR filter tap comparison, a) Proposed architecture used in Fig. 6, b) Architecture used in RNS1 of [12].
TABLE II and that of Fig. 2 (Tap 5). Note that decimal values are used
N UMERICAL E XAMPLE TO D EMONSTRATE F UNCTIONALITY instead of the actual binary, for ease of tracing.
OF THE P ROPOSED M ODULAR FIR F ILTER A RCHITECTURE
To better demonstrate the impact of the proposed deferred
EAC modular addition on RNS-FIR filter realization, without
loss of generality, we embark upon applying our technique on
two other quite recently reported architectures [12] RNS1 and
RNS2 that were briefly described at the end of Section II.
Consequently, Figs. 6 and 8 depict the corresponding two
new designs New2 and New3, respectively, where details of
each tap is illustrated by Figs. 7a (i.e., our technique applied
on RNS1), 7b (RNS1 of [12]), 9a (ours on RNS2), and
9b (RNS2), respectively.
Fig. 8. Proposed New3 modulo-(28 − 1) k-tap FIR filter, based on RNS2 of [12] and using deferred end-around carry modular addition.
Fig. 9. FIR filter tap comparison, a) Proposed architecture used in Fig. 8, b) Architecture used in RNS2 of [12].
For evaluation and comparison purposes, the required hard- corresponding implementations in one similar channel and in
ware descriptions (i.e., all the required memory units, registers, the span of all taps (including the extra stages) whose number
buffers, adders and multipliers), for the circuits under test, varies from 16 to 1024 (see Subsection B). The exact measures
regarding the three comparison sets, are implemented and sim- resulted from our experiments are compiled in Tables III-VII.
ulated to verify their correct functioning. The corresponding However, for ease of comparison and better reading, we pro-
HDL codes are mapped to the CMOS standard cell library of vide eight plots (see Figs. 10-17) based on the contents of these
the 90nm technology node of the TSMC, using the Synopsys tables. Note that since the architectures RNS1 and RNS2 of
Design Compiler. Note that there are other components (such the reference work [12] cannot be realized for other moduli
as binary-to-RNS and reverse converters) that are exactly the besides 2n −1, we do the same for New2 and New3. However,
same in all designs and thus are not taken into account in for comparison with [13], note that neither their design nor
our evaluations. Regarding the evaluation of power dissipation, ours is dependent on the value of δ. Therefore, for synthesis
it is worth mentioning that the power measures are extracted purposes, we have considered the widest possible registers that
based on simulation of synthesis results with back-annotation may be required depending on δ, n, and k. In other words,
of toggling activity, where uniformly distributed sample input hardware realization for the modulo-(2n ± δ) filter tap/channel
values are applied. consumes the same amount of hardware and follows the same
Two general scenarios are deliberated in our evaluation structure for any value of δ, as far as it provides sufficient
experiments. One is comparing the results of single tap space for its registers. This is possible due to the realization of
realizations (see Subsection A, below). The other mainly aims the multipliers using LUTs and also use of general structure for
to consider impact of the extra stages, and thus compares the modular adders. However, in actual implementations, where
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
BELGHADR AND JABERIPUR: FIR FILTER REALIZATION VIA DEFERRED EAC MODULAR ADDITION 7
TABLE III
A REA (mm2 ) AND AT (A REA -T IME P RODUCT ) C OMPARISON FOR S INGLE TAP
TABLE IV
P OWER (uW) AND E NERGY (PDP) C OMPARISON FOR S INGLE TAP
the moduli set is clearly known (i.e., all δ j s are determined), building blocks are implemented as were proposed in the
register sizes can be precisely set with no additional memory original papers.
allocation. Moreover, there are two restrictions in [13]; namely Comparison Set 1: Regarding the comparison of [13] and
the moduli of the form 2n ± δ should be prime and with n ≤ 6 New1 designs, the Adder1 and one n-bit multiplexer of the
to allow for reasonable LUT sizes. Although our contribution former design are to be compared against the incrementor
does not enforce such limitations, for fair comparison, our (and the associated register) of the latter. Therefore, obvi-
New1 design observes the same restrictions as in [13]. ous delay improvement (due to lack of multiplexer), and
area/power reduction (due to lack of the Adder1 and mul-
A. Tap Comparison tiplexer) are expected for the proposed work. This expecta-
Structural difference of the proposed tap architecture with tion is confirmed by the synthesis results that are reflected
those of the reference works can be captured by examining in Tables III and IV.
Figs. 5, 7 and 9, where critical delay paths (highlighted Comparison Sets 2 and 3: In the comparisons between
red) contain the same multiplier architecture within each architectures in [12] and the corresponding proposed cir-
comparison set. cuits, components of the critical delay paths are the mod-
We should reiterate that, in our experiments, all adders ular and non-modular adder blocks of the reference work
of [12] and [13] have been realized via parallel prefix and the simple non-modular adders and incrementors of the
architecture (see also Section III), while all other basic proposed architectures. In the proposed realizations, lower
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 10. Area comparison of FIR filter tap (New designs vs. references). Fig. 12. Power comparison of FIR filter tap (New designs vs. references).
Fig. 11. AT comparison of FIR filter tap (New designs vs. references). Fig. 13. Energy comparison of FIR filter tap (New designs vs. references).
TABLE V
area and power consumptions are expected, due to lack of C OMPARISON OF F IGURES OF M ERIT FOR S INGLE TAP OF FIR F ILTER
carry-in supplements in the utilized parallel prefix adder, (I MPROVEMENT F IGURES IN THE 2’s C OMPLEMENT C OLUMN
and replacement of large modular adder blocks with non- R EGARD THE P ERCENTAGE OF N EW 2 M EASURES OVER
THE 2’s C OMPLEMENT E XPERIENCE )
modular simple adders. The curves in Figs. 10-13 compare
the area, area-time product (AT), power, and energy (power-
delay product or PDP) measures, respectively, for the three
comparison sets (i.e., [13] vs. New1, RNS1 of [12] vs.
New2, and RNS2 of [12] vs. New3) and only for n = 8.
These are drawn based on the data of Tables III and IV,
which are extracted from reiterated experiments for differ-
ent time constraints applied into synthesis process. Note
that the curves related to the proposed designs cover a
range of higher working frequencies (see left-most part of
the curves) not shared with those of the reference works,
where the synthesis tool was not able to produce as fast
circuits.
The least delay values acquired by the synthesis tool,
show 13.2%, 16.8%, and 48.9% speed up against the refer-
ence works [13], RNS1 [12], and RNS2 [12], respectively.
Greater impacts are evident through the curves of Fig. 10 for
area reduction (27.3%, 27.8%, and 29.2%) and Fig. 12 for frequency achieved by both designs in each comparison set,
power saving (21.2%, 28.7%, and 37.2%). Note that the latter not those of the highest frequency experienced by each design.
reported area and power improvements regard the highest To better capture the exact figures of merit, the values are
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
BELGHADR AND JABERIPUR: FIR FILTER REALIZATION VIA DEFERRED EAC MODULAR ADDITION 9
Fig. 14. Area improvement for modulo-(28 −δ) channel of the realized filters Fig. 16. Power improvement for modulo-(28 − δ) channel of the realized
for the three comparison sets. filters for the three comparison sets.
Fig. 15. AT improvement for modulo-(28 − δ) channel of the realized filters Fig. 17. PDP improvement for modulo-(28 −δ) channel of the realized filters
for the three comparison sets. for the three comparison sets.
summarized in Table V. Acknowledgement of RNS advantages than that of a single tap and thus do not violate the pipeline
in FIR filter realization by previous studies was addressed in clocking that is based on one filter tap delay. Similarly, extra
Section I. Nevertheless, we provide design parameters for a circuitry at the final stages of the reference architectures
plain 2’s complement FIR tap implementation with the same RNS1 and RNS2 of [12] exist that must be evaluated in
technology that are obtained based on behavioral description this part. The improved area and power measures that were
for the synthesis tool. The results are compared against the reported in Tables III and IV, regard one tap comparison.
fastest RNS alternative (i.e., New2) in Table V. Superiority of However, in order to consider the overhead of the added final
the latter is clearly evident. stages of the proposed and reference architectures, modulo-
Although we have run our experiments only for n = 8 (2n ± δ) filter channels (2n − 1 for the comparison with
and n = 16, it is easy to analytically conclude that the same reference works RNS1 and RNS2) with varying number of
superiority of the new designs in all the three comparison sets taps (including the final taps), are realized and compared for
could be experienced for larger values of n. The reason is that the proposed and reference designs.
the impact of n is equally sensed in the depth (i.e., the number Tables VI and VII provide for area/AT and power/energy
of node levels) of the parallel prefix adders utilized in both measures, respectively, which are obtained for the highest
designs of each comparison set. That is doubling n adds one working frequency that is experienced by both designs in
parallel prefix level with the effect of additional delay of two each comparison set for operands of width n = 8, 16 and
simple 2-input gates, and extra area consumption and power seven different number of taps from 16 to 1024. For ease
dissipation due to almost doubled size of circuits. of comparison, we use the tables’ contents for n = 8 and
different number of taps to draw the plots in Figs. 14-17 that
B. Overhead of Extra Stages demonstrate the improvement percentage (regarding the pro-
The synthesis results show that delay of the two extra posed design vs. the referenced work in each comparison set)
terminating stages in Fig. 4 (i.e., 0.48 ns, and 0.36 ns) are less in area, AT, power, and energy, respectively. Convergence of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE VI
A REA AND AT (A REA -T IME P RODUCT ) VALUES FOR M ODULO -(2n − δ) FIR F ILTER C HANNEL D ESIGNS
TABLE VII
P OWER AND E NERGY (P OWER -D ELAY P RODUCT ) VALUES FOR M ODULO -(2n − δ) FIR F ILTER C HANNEL D ESIGNS
the percentage curves to constant values towards the high the required dynamic range in such applications. Therefore,
number of taps indicate that overhead of the extra stages the use of repeated moduli of the form 2n ± 1, with larger n,
become negligible as the number of taps grows, as is the and additional moduli of the form 2n ± δ has been experienced
case for high frequency selectivity applications. Also note in RNS realization of FIR filters. Regarding the required
that the improvement curves related to the comparison set of modulo-(2n ± δ) adders, however, the existing realizations for
RNS1 [12] and New2 show less variation against different arbitrary δ-values by far are not adequately compatible with τ -
number of taps. This is due to the extra circuitry employed at adders, particularly in timing balance. This can result in longer
the final stage of the RNS1 filter architecture. tap delay, a problem for which we were motivated to look for
a solution.
We reviewed the previous RNS-FIR filter realizations and
V. C ONCLUSION
focused on their multi-operand modular adder architectures,
It is well known that introduction of residue number sys- which we found it not very efficient. Therefore, we proposed
tem into hardware realization of FIR filters is advantageous. a new technique for realization of such adders that is based on
On the other hand, different studies over the problem of postponing the end-around carry (EAC) addition. We studied
RNS moduli selection concluded that the popular moduli set the impact of this method in consecutive modulo-(2n ± δ)
τ = {2n , 2n + 1, 2n − 1} may not be adequate in order to cover additions that occur in FIR filter RNS realization. A few
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
BELGHADR AND JABERIPUR: FIR FILTER REALIZATION VIA DEFERRED EAC MODULAR ADDITION 11
extra stages were added to the sequence of taps for EAC [19] S. Negovan, “Digital fir filter architecture based on the residue number
correction, with no penalty on the pipeline clocking time. Our system,” Facta Univ.-Ser., Electron. Energetics, vol. 22, pp. 125–140,
Apr. 2009.
synthesis results showed over 11-32%, 27-29%, and 21-37%, [20] A. Lindahl and L. Bengtsson, “A low-power FIR filter using combined
reductions in delay, area, and power measures, respectively. residue and radix-2 signed-digit representation,” in Proc. DSD, 2005,
These improvements were achieved despite the area and power pp. 42–47.
[21] L. Kalampoukas, D. Nikolos, C. Efstathiou, H. T. Vergos, and
cost of the added stages into the pipelined architecture of J. Kalamatianos, “High-speed parallel-prefix modulo 2n − 1 adders,”
the filter which was shown to become less significant as the IEEE Trans. Comput., vol. 49, no. 7, pp. 673–680, Jul. 2000.
number of filter taps grows. [22] G. Jaberipur and S. Nejati, “Balanced minimal latency RNS addition for
moduli set 2n − 1, 2n , 2n +,” in Proc. 18th Int. Conf. Syst., Signals,
As for the future relevant work, we plan to study the impact Image Process., Jun. 2011, pp. 159–165.
of the deferred EAC technique on RNS multipliers, especially [23] H. T. Vergos and C. Efstathiou, “Design of efficient modulo 2n + 1
in RNS-FIR filter applications. multipliers,” IET Comput. Digit. Techn., vol. 1, no. 1, pp. 49–57,
Jan. 2007.
[24] G. Jaberipur and B. Parhami, “Efficient realisation of arithmetic algo-
R EFERENCES rithms with weighted collection of posibits and negabits,” IET Comput.
Digit. Techn., vol. 6, no. 5, pp. 259–268, Sep. 2012.
[1] L. Tan and J. Jiang, Digital Signal Processing: Fundamentals and [25] S. H. F. Langroudi and G. Jaberipur, “Modulo-(2n − 2q − 1) parallel
Applications, 2nd ed. Orlando, FL, USA: Academic, 2013, p. 876. prefix addition via excess-modulo encoding of residues,” in Proc.
[2] A. Nannarelli and M. Re, “Residue number systems: A survey,” Tech. ARITH, Lyon, France, 2015, pp. 121–128.
Univ. Denmark, Kongens Lyngby, Denmark, Tech. Rep. 2008-04, 2008. [26] H. T. Vergos and C. Efstathiou, “On the design of efficient modular
[3] W. L. Freking and K. K. Parhi, “Low-power FIR digital filters using adders,” J. Circuits Syst. Comput., vol. 14, no. 5, pp. 965–972, 2005.
residue arithmetic,” in Proc. Conf. Rec. 31st Asilomar Conf. Signals, [27] G. C. Cardarilli, A. Nannarelli and M. Re, “RNS applications in digital
Syst. Amp, Comput., vol. 1. Nov. 1997, pp. 739–743. signal processing,” in Embedded Systems Design with Special Arithmetic
[4] R. Conway and J. Nelson, “Improved RNS FIR filter architectures,” and Number Systems. Cham, Switzerland: Springer, 2017, pp. 181–215.
IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 51, no. 1, pp. 26–28, [28] I. M. Vinogradov, An Introduction to the Theory of Numbers. New York,
Jan. 2004. NY, USA: Pergamon, 1955.
[5] G. C. Cardarilli, A. Del Re, A. Nannarelli, and M. Re, “Low power and
low leakage Implementation of RNS FIR filters,” in Proc. Conf. Rec.
39th Asilomar Conf. Signals, Syst. Comput., 2005, pp. 1620–1624.
[6] T. K. Shahana, R. K. James, B. R. Jose, K. P. Jacob, and S. Sasi, “Per-
formance analysis of FIR digital filter design: RNS versus traditional,”
in Proc. ISCIT, Sydney, NSW, Australia, 2007, pp. 1–5.
[7] W. Jenkins and B. Leon, “The use of residue number systems in the
design of finite impulse response digital filters,” IEEE Trans. Circuits
Syst., vol. CS-24, no. 4, pp. 191–201, Apr. 1977. Armin Belghadr received the B.S. degree in com-
[8] G. C. Cardarilli, A. Nannarelli, and M. Re, “Residue number system for puter hardware engineering and the M.S. degree in
low-power DSP applications,” in Proc. Conf. Rec. 41st Asilomar Conf. computer architecture from Shahid Beheshti Univer-
Signals, Syst. Comput., Pacific Grove, CA, USA, 2007, pp. 1412–1416. sity, Tehran, Iran, in 2011 and 2013, respectively.
[9] P. Patronik, K. Berezowski, S. J. Piestrak, J. Biernat, and A. Shrivastava, He is currently pursuing the Ph.D. degree in
“Fast and energy-efficient constant-coefficient FIR filters using residue computer architecture with the Department of Com-
number system,” in Proc. ISLPED, Fukuoka, Japan, 2011, pp. 385–390. puter Science and Engineering, Shahid Beheshti
[10] D. Živaljević, N. Stamenković, and V. Stojanović, “Digital filter imple- University. He focuses on teaching and research
mentation based on the RNS with diminished-1 encoded channel,” in in the mainstreams of computer-aided design, com-
Proc. TSP, Prague, Czech Republic, 2012, pp. 662–666. puter arithmetic, and 3-D field programmable gate
[11] N. I. Chervyakov, P. A. Lyakhov, and K. S. Shulzhenko, “FIR filters arrays. His research interests include computer
in two-stage residue number system,” in Proc. EnT, Moscow, Russia, arithmetic and particularly residue and redundant number systems.
2014, pp. 26–29.
[12] K. S. Reddy and S. K. Sahoo, “An approach for fixed coefficient RNS-
based FIR filter,” Int. J. Electron., vol. 104, no. 8, pp. 1–19, 2017.
[13] G. C. Cardarilli, A. Nannarelli, and M. Re, “Reducing power dissipation
in FIR filters using the residue number system,” in Proc. 43rd IEEE
Midwest Symp. Circuits Syst., vol. 1. Lansing, MI, USA, Aug. 2000,
pp. 320–323.
[14] I. Kouretas and P. Vassilis, “Delay-variation-tolerant FIR filter architec-
tures based on the residue number system,” in Proc. IEEE Int. Symp. Ghassem Jaberipur received the B.S. degree in
Circuits Syst. (ISCAS), May 2013, pp. 2223–2226. electrical engineering from the Sharif University
[15] J. H. Choi, N. Banerjee, and K. Roy, “Variation-aware low-power synthe- of Technology in 1974, the M.S. degree in engi-
sis methodology for fixed-point FIR filters,” IEEE Trans. Comput.-Aided neering from UCLA in 1976, the M.S. degree in
Design Integr. Circuits Syst., vol. 28, no. 1, pp. 87–97, Jan. 2009. computer science from the University of Wisconsin,
[16] A. Del Re, A. Nannarelli, and M. Re, “A tool for automatic generation Madison, in 1979, and the Ph.D. degree in computer
of RTL-level VHDL description of RNS FIR filters,” in Proc. Eur. Conf. engineering from the Sharif University of Technol-
Exhib. Design, Autom. Test, vol. 1. 2004, pp. 686–687. ogy in 2004. He is currently a Professor of com-
[17] G. C. Cardarilli, A. Del Re, A. Nannarelli, and M. Re, “Impact of RNS puter engineering with the Department of Computer
coding overhead on FIR filters performance,” in Proc. Conf. Rec. 41st Science and Engineering, Shahid Beheshti Univer-
Asilomar Conf. Signals, Syst. Comput., Pacific Grove, CA, USA, 2007, sity, Tehran, Iran. He is also with the School of
pp. 1426–1429. Computer Science, Institute for Research in Fundamental Sciences, Tehran.
[18] Y. Liu and E. M.-K. Lai, “Moduli set selection and cost estimation for His main research interest is in computer arithmetic. He is recognized as one
RNS-based FIR filter and filter bank design,” Design Autom. Embed. of the 50 distinguished graduates for years 1966–2016 in the Sharif University
Syst., vol. 9, no. 2, pp. 123–139, Jun. 2004. of Technology.