International Journal of Advanced Technology & Engineering Research (IJATER)
ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 59
LOW POWER DESIGNING OF FIR FILTERS Sarita Chouhan1, Rajasthan Technical University, Kota, Rajasthan India Yogesh Kumar2, Rajasthan Technical University, Kota, Rajasthan India
Abstract
There are different entities that one would like to optimize when designing a VLSI circuit. The design of an efficient integrated circuit in terms of power, area, and speed simultaneously, has become a very challenging problem. Power dissipation is a critical parameter in modern VLSI design field. Multiplication occurs frequently in finite impulse response (FIR) filters, fast Fourier transforms, discrete cosine transforms, convolution, and to save significant power consumption of a VLSI design, it is a good direction to reduce its dynamic power that is the major part of total power dissipation. Here, we propose designing of FIR filter using high speed low-power multiplier adopting the new implementing approach. The multiplier is designed by a modified Booth Algorithm which is controlled by a detection unit using an AND gate and carry save adder. The modified booth algorithm will reduce the number of partial products generated by a factor of 2. The carry save adder will avoid the unwanted addition and thus minimize the switching power dissipation. The proposed high speed low power multiplier can attain 30 percent speed improvement and 22 percent power reduction in the modified booth algorithm when compared with the conventional array multipliers. Index key: Multipliers, Modified Booths algorithm, Spartan-3E FPGA, VHDL 1. Introduction Multipliers play an important part in todays digital signal processing (DSP) systems. Examples of their use occur in implementations of recursive and transverse filters, discrete Fourier transforms, correlation, range measurement and in most of these cases it is enough with a multiplier unit design for specific purpose. Multipliers have large area, long latency and consume considerable power. Therefore, low-power multiplier design has been an important part in low-power VLSI system design. The main research hypothesis of this work is that high-level optimization of multiplier designs produces more power- efficient solutions than optimization only at low levels. Specifically, we consider how to optimize the internal algorithm and architecture of multipliers and how to control active multiplier resource to match external data characteristics. The primary objective is power reduction with small area and delay overhead. By using new algorithms or architectures, it is even possible to achieve both power reduction and area/delay reduction, which is another strength of high-level optimization. For these requirements of smaller area occupation, less power consumption and faster operation, Booths algorithm is practically used. This encoding algorithm is suitable for 2s complementary and signed number multiplication. Booths algorithm also requires redundant partial product generations, so-called sign-extension. In any multiplication algorithm, the operation is decomposed in a partial product summation. Each partial product represents a multiple of the multiplicand to be added to the final result. Nowadays almost all high-speed multipliers apply a radix-4 recoding multiplication algorithm. In a radix-2 algorithm, first we make a series of products between the multiplicand, Y, and every bit of the multiplier, X, generating in this way a set of words called partial products. Next, all the partial products are added. We use some kind of redundant arithmetic to get the additions as fast as possible. Usually the speed is increased with a Wallace reduction tree . In the conventional Wallace tree, multi-input partial product bits, at the same bit position, are consecutively compressed to a final sum and carry signal pair by using a series of single-bit full adders (also called 3-2 compressors). At the output, we have two words (sum and carry) which have to be added as fast as possible by a carry-propagate adder (CPA). The Wallace tree structure is a version of the carry-save adders (CSA). Radix-4 multiplication obtains an improvement in the multiplication algorithm due to the less number of partial products entering the Wallace tree to be reduced. This can be achieved by the application of the multiplier recoding, changing from a 2s-complement format to a signed-digit representation from the set.
2. Related Work
A substantial amount of research work has been put into developing efficient architectures for multipliers given their widespread use and complexity. Schemes such as bisection, Baugh-Wooley and Hwang propose the implementation of a 2s complement architecture, using repetitive modules with uniform interconnection patterns. However, it is not permitted an efficient VLSI realization due to the irregular tree array form used. International Journal of Advanced Technology & Engineering Research (IJATER)
ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 60 More regular and suitable multiplier designs based on the Booth recoding technique have been proposed. The main purpose of these designs is to increase the performance of the circuit by the reduction of the number of partial products. Although the Booth algorithm provides simplicity, it is sometimes difficult to design for higher radices due to the complexity to pre-compute an increasing number of multiples of the multiplicand within the multiplier unit. In the Modified Booth algorithm approximately half of the partial products that need to be added is used. In our work, the improvement in delay and power has the same principal source as for the Booth architecture, the reduction of the partial product terms, while keeping the regularity of an array multiplier. We show that our architecture can be more naturally extended for higher radices, using less logic levels and hence presenting much less spurious transitions. 3. Booth Algorithm
Booths algorithm involves repeatedly adding one of two predetermined values A and S to a product P, then performing a rightward arithmetic shift on P. Let m and r be the multiplicand and multiplier, respectively; and let x and y represent the number of bits in m and r. [4] 1. Determine the values of A and S, and the initial value of P. All of these numbers should have a length equal to (x + y + 1).
(a) A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1) bits with zeros. (b) S: Fill the most significant bits with the value of (-m) in twos complement notation. Fill the remaining (y + 1) bits with zeros. (c) P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill the least significant (rightmost) bit with a zero.
2. Determine the two least significant (rightmost) bits of P.
(a) If they are 01, find the value of P + A. Ignore any overflow. (b) If they are 10, find the value of P + S. Ignore any overflow. (c) If they are 00, do nothing. Use P directly in the next step. (d) If they are 11, do nothing. Use P directly in the next step. 3. Arithmetically shift the value obtained in the 2nd step by a single place to the right. Let P now equal this new value. 4. Repeat steps 2 and 3 until they have been done y times. 5. Drop the least significant (rightmost) bit from P. This is the product of m and r.
Booth algorithm itself can be of two types 1. Radix-2 algorithm and 2. Radix-4 algorithm. These are described in the later sections.
4. Hardware Architecture Field Programmable Gate Arrays (FPGAs) can be reprogrammed as many times in order to achieve the desired result. The major design benefit in this lies in the ability to test designs that might work. Prior to the development of the FPGA, the fabrication process can be quite expensive and very time consuming. The use of FPGAs in the design process allows the more design flexibility, and reducing a cost and developing time. If the design fails after being tested on a FPGA, the designer can simply rework the design and download it again to the FPGA. Use of an FPGA would thus eliminate the loss in development time caused by a faulty initial design, as well as giving the designer knowledge of whether or not the design works.
4.1 Adders
4.1.1 Ripple Carry Adder
This is the simplest type of adder but bot very efficient when large number of bits are used. Delay increases linearly with bit length.
Figure 1: Block diagram of Ripple carry adder
International Journal of Advanced Technology & Engineering Research (IJATER)
Here in this scheme, blocks of bits are added in two ways: (1) One assuming a carry in of 0'. (2) Other with a carry in of 1.
Figure 3: Block diagram of Carry select adder
Figure 4: Simulation Output
4.1.3 Carry Look Ahead Adder
It can produce carries faster due to carry bits generated in parallel by an additional circuitry whenever inputs change. This technique uses carry bypass logic to speedup carry propagation.
Figure 5: Block diagram of Carry Look ahead adder
Pi = ai + bi Gi = aibi Si = ((ai xor bi) xor ci) Ci+1 = Gi +PiCi
Figure 6: Simulation Output
4.1.4 Sixteen Bit Full Adder
It is just simply a 16- bit full adder in which we have two 16 bit input with one carry in and a 16 bit sum output with a single bit carry out.
Figure 7: Simulation Output
International Journal of Advanced Technology & Engineering Research (IJATER)
First we compute the sum ignoring any carries and separately we can compute carry on a column by column basis. Now sum can be computed by addition 's' and 'c' in final stage of addition.
Figure 8: Block Diagram of Carry Save adder
Figure 9: Simulation Output
4.1.6 Carry Skip Adder
It is used to speed-up operation, propagation is skipped to position i without waiting for ripp-ling operation time varies according to operands as in carry-complete addition to implement carry-skip adder, stages are divided into blocks. Carry-skip logic is added to each block to detect when carry-in the block can be passed directly to the next block.
Figure 10: Block Diagram of Carry Skip Adder
Figure 11: Simulation Output
4.2 Multipliers
4.2.1 Array Multiplier
A binary multiplier is an electronic hardware device used in Digital Electronic or a computer or other electronic devices to perform rapid multiplication of two numbers in binary representation. It is built up by using binary adder. [10]
Figure 12: Block Diagram of Array Multiplier
International Journal of Advanced Technology & Engineering Research (IJATER)
ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 63
Figure 13: Simulation Output
4.2.2 Booth Multiplier
Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding the numbers that are multiplied. It is the standard technique used in chip design, and provides significant improvements over the long multiplication technique. [7]
The booth multiplier is widely used in ASIC- oriented products due to the higher computing speed and smaller area. This encoding techniques has two advantages : a) only about half of the product are needed during the computation, that is, the number of partial products is reduced by a factor 2 : b) delay on the critical path is less than that of the Baugh- Wooley Multiplier.
4.2.2.1 Radix 2 Multiplier
This is technique that allows for smaller, faster multiplication circuit by recoding the numbers that are multiplied. It allows only half of product which is needed during computation that is no. Of partial products is reduced by factor 2.It has two major drawback:
1) As no of add/subtraction operations become variable which is inconvenient for parallel multiplier.
2) It is inefficient when there are isolated 1's.
Figure 14: Block diagram of Radix 2 multiplier
Figure 15: Simulation output
4.2.2.2 Radix 4 Multiplier
These multiplication schemes handle more than one bit of the multiplier in each cycle. A higher representation radix leads to fewer digits. Thus, a digit-at-a time multiplication algorithm requires fewer cycles as we move to higher radices, which means fewer partial products. The reduction in the number of cycles, along with the use of recoding and carry-save adders, leads to significant gains in speed over basic multipliers. [6]
Four decades ago. MacSorley proposed a modification of Booths algorithm a decade after. The modified Booths algorithm (radix-4 recoding) starts by appending a zero to the right of x0 (multiplier LSB). Triplets are taken beginning at position x -1 and continuing to the MSB with one bit overlapping between adjacent Triplets. If the number of bits in X (excluding x -1) is odd, the sign (MSB) is extended one position to Ensure that the last triplet contains 3 bits. In every step we will get a signed digit that will multiply the multiplicand to generate a partial product entering the Wallace reduction tree. The meaning of each triplet can be seen in figure:
Table 1: Radix 4
This recoding scheme applied to a parallel multiplier halves the number of partial products so the multiplication time and the hardware requirements decrease. [2]
4.2.2.3 Modified Radix 4 Multiplier using Carry Save Adder
International Journal of Advanced Technology & Engineering Research (IJATER)
ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 64 One of the major speed enhancement techniques used in modern digital circuits is the ability to add numbers with minimal carry propagation. Carry save adders are one of the most widely used techniques for fast arithmetic in industry. In this work, we use carry save adders in the partial product lines of the new multiplier proposed in in order to speed-up the carry propagation along the array. In this multiplier, a new approach is used to handle operands in 2s-complement with exactly the same structure as an array multiplier, with the same unsigned bit products for all the bits except those that involve a sign bit. The regularity of this multiplier makes it suitable for the application of carry save adders, since the ability of it to combine three or four numbers to two, in a time that is independent of the width of the numbers, is a much more efficient alternative than using traditional ripple carry adder. Carry-save adders (CSA) can be used to reduce the number of addition cycles as well as to make each cycle faster. A row of binary FA is used as a mechanism to reduce three numbers to two numbers, rather than finding a single sum A carry save adder is very fast because it simply outputs the carry bits instead of propagating them to the left. As will be presented in the next section, we apply carry save adders in the partial product lines of an array multiplier circuit in order to speed-up the carry propagation along the array. [3]
Figure 16: Radix 4 Using Carry Save Adder
Figure 17: Simulation Output Considering the multiplication of two 2 s-complement integers with n-bit multiplicand A and n-bit multiplier B as
P1 denotes the I-th output product bit. Note that a i , and b i , indicate data bits of multiplicand and (i multiplier, respectively. Assume n is even and the n-bit multiplier B can be rewritten as
where b-1=0 . Note that the terms in the bracket in Eq (4) have values of (-2, -1, 0, 1, 2). Each recoded value performs a certain operation on the multiplicand A, and then the multiple additions at each stage would be required in order to generate the correct partial product. It is worth mentioning that the operation of -A can be realized by the inversion of the multiplicand and addition of I at the least significant bit. Substituting Eq. (4) into Eq. (1), we can obtain Eq. (5) as
and it is known that the scanning of triplets begins from b-I to the MSB with one-bit overlapping. Thus, only the number of n/2 partial-product rows needs to be computed.
International Journal of Advanced Technology & Engineering Research (IJATER)
A Wallace tree is an efficient hardware implementation of a digital circuit that multiplies two integers. The WT multiplier sums up all the bits of the same weights in a merged tree rather than completely adding the partial products in pairs. Full adder (FA) and Half adder (HA) cells are used to add three or two equally weighted bits respectively to produce two bits: the sum bit with a weight equal to that of the operands and the carry bit with a weight equal to one more than that of the operands. The height of the WT is reduced by a factor of 3:2, whenever a FA is used. The final tree is composed of as many levels of FA and HA cells as are necessary to reduce the height of the tree to 2. The hardware synthesis process for a WT multiplier mainly consists of two steps. The first step is to arrange the partial product bits as the initial WT structure, as shown in Fig. 2 for the case of a 4x4 multiplier with operands (a3; a2; a1; a0) and (b3; b2; b1; b0). Secondly, a series of FA and HA transformations are applied on the WT structure until the tree height is reduced to 2. At this point, any n-bit conventional adder may be used to add the remaining two n-bit rows of the tree to get the final multiplication result.
Figure 18: Block Diagram of Wallace Tree Multiplier
Figure 19: Simulation Output
4.3 FIR Filter
A FIR filter provides variable length taps have been widely used in many application fields. It is memory chip in which an address generation unit & modulo unit to access memory in a circular manner. A simple FIR filter is described by a convolution operation.
Figure 20: Block Diagram of FIR Filter
Difference equation
Where, B k is the set of filter coefficients.
5. Simulations and Result
In this work we are evaluating the performance of the proposed FIR filter using low power consumption multiplier by comparing Radix 4 multiplier with the different multipliers. These multipliers can be implemented using VHDL coding. In order to get the power report and delay report we are synthesizing these multipliers using Xilinx and Modelsim. Simulation result for the FIR filter using array multiplier are given in figure
Figure 21: Simulation Output Of FIR Filter
The comparison of synthesis report for different adders are given in below Table 2 International Journal of Advanced Technology & Engineering Research (IJATER)
ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 66
Table 2: Analysis of adders
Param-- eters 16 bit full adde- r
Rippl- e carry adder
Carr- y look ahea d
Carry save adder
Car- ry sele -ct add- er
Cary skip adde- r
No. of slices 18 18 16 18 16 16 No. of 4 input LUTS 31 32 28 32 32 32 No. of bonded inputs 49 50 42 81 32 32 No. of bonded outputs 49 50 42 81 32 32
Delay (ns) 32.93 0
34.559
28.43 9 10.448 12.9 8
15.86 2 Memor y (in kb) 7195 2
69584
6958 4
69392
693 90
6939 0
Power evaluat ed ( in W) 0.144 0.140 0.144 0.151 0.13 8 0.137 Power delay product 4.742 4.838 4.095 1.583 1.79 1 2.173
The comparison of synthesis report for different multipliers are given in below Table 3
Table 3: Analysis of multipliers
Parameters
Array Multipli- er
Radix 2 Multipli- er
Radix 4 Multipli- er
Wallac-e Tree Multip- lier
No. of slices 76 55 14 192 No. of 4 input LUTS 133 99 24 384 No. of bonded inputs 32 34 27 90 No. of bonded outputs 32 34 27 90 Delay (ns) 32.237 18.726 13.798 286.487 Memory (in kb) 72272
70480
69712
113684
Power evaluated (in Watt) 0.143 0.140 0.143 0.155 Power delay Product 4.609 2.621 1.973 44.405
6. Conclusion
Here, we proposed a novel design for a FIR filter using Parallel Multiplier with carry save adder (Modified Booth Algorithm. The implementation of the algorithm with an architecture and logic design is presented where in the Speed, Power and complexity of the design are compared to other designs. Also, the proposed design is an area efficient multiplier useful in decreasing area consequently reduces the cost. The delay encountered is reduced and the processing speed is increased than those obtained in other conventional techniques. The proposed activity evaluation method leads to consumed low power estimations and very fast estimation times. At the end we calculate and compare the power delay product of various adders and multipliers and concluded that for the minimal power design of FIR filter Radix-4 multiplier using carry save adder should be used as the power delay product of carry save adder and Radix-4 multiplier is minimum as compare to other adders and multipliers.
References
[1] A. Tisserand, Automatic generation of low-power circuits for the evaluation of polynomials, in Proc. 40 th
Asilomar Conference on Signals, Systems and Computers. Pacific Grove, California, U.S.A.:IEEE, Oct. 2006, pp. 20532057.
[2] Zhijun Huang,High level optimization techniques for low power multiplier design University of California, los angels, 2003.
[3] L. Ciminiera, P. Montuschi, Carry-Save Multiplication Schemes Without Final Addition, IEEE Transaction on Computer, vol. 45, no. 9, Sep. 1996.
International Journal of Advanced Technology & Engineering Research (IJATER)
ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 67 [4] A. D. Booth, A signed binary multiplication technique, Quart. J. Mechanical and Applied Math., vol.4, pp. 235240, 1951.
[5] W.-C. Yeh and C.-W. Jen, High-speed booth encoded parallel multiplier design, IEEE Transactions on Computers, vol. 49, pp. 692701, 2000.
[6] B. S. Cherkauer, E. G. Friedman, A Hybrid Radix- 4/Radix-8 Low Power Signed Multiplier Architecture, IEEE Transaction on Circuits and Systems, vol.44, no. 8, Aug. 1997.
[7] H. Lee, A power-aware scalable pipelined Booth multiplier, in Proc. IEEE Int. SOC Conf., 2004, pp.123- 126.
[8] Tisserand, Low-power arithmetic operators, in Low Power Electronics Design, C. Piguet, Ed. CRC Press, Nov. 2004, ch. 9.
[9] Nagendra, M. J. Irwin, and R. M. Owens, Area- timepower tradeoffs in parallel adders, IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, vol. 43, no. 10, pp. 689-702, Oct. 1996.
[10] Z. Huang and M. D. Ercegovac, High-performance low- power left- toright array multiplier design, IEEE Transactions on Computers, vol. 54, no. 3, pp. 272- 283, Mar. 2005.
[11] P. F. Stelling, C. U. Mattel, V. G. Oklobdzija, and R. Ravi, Optimal Circuits for Parallel Multipliers, IEEE Transactions on Computers, vol. 47(3): 273 - 285, 1998.
[12] W. C. Yeh, and C. W. Jen, High-Speed Booth Encoded Parallel Multiplier Design, IEEE Transactions on Computers, vol. 49 (7), pp. 692-701, 2000.
[13] J. A. Gibson and R. W. Gibbard, Synthesis and Comparison of Twos Complement Parallel Multipliers, IEEE Transactions on Computers, vol. C-24, pp 10201027, October 1975.
[14] C. S. Wallace, Suggestion for a Fast Multiplier, IEEE Transactions on Electronic Computers, vol. EC-13, pp. 1417, 1964.
[15] W. Gallagher and E. Swartzlander. High Radix Booth Multipliers Using Reduced Area Adder Trees. In Twenty-Eighth Asilomar Conference on Signals, Systems and Computers, volume 1, pages 545549, 1994 . [16] E. Costa, J. Monteiro, and S. Bampi. A New Architecture for Signed Radix 2 Pure Array Multipliers. In IEEE International Conference on Computer Design, pages 112117, 2002.
[17] K.H. Tsoi, P.H.W. Leong, "Mullet - a parallel multiplier generator," fpl, pp.691-694, International Conference on Field Programmable Logic and Applications, 2005., 2005.
[18] S. Tahmasbi Oskuii, P. G. Kjeldsberg, and O. Gustafsson, "Transition activity aware design of reduction-stages for parallel multipliers," in Proc. 17th Great Lakes Symp. On VLSI, March 2007, pp. 120-125.
[19] Ayman A. Fayed, Magdy A. Bayoumi, "A Novel Architecture for Low- Power Design of Parallel Multipliers," vlsi, pp.0149, IEEE Computer Society Workshop on VLSI 2001, 2001.
Biographies
SARITA CHOUHAN received her B. Eng. Degree in Electronics & Communication from Rajeev Gandhi Proudyogiki Vishwavidyalaya, Bhopal, India in 2002 and M.Tech degree in VLSI design from Mewar University, Chittorgarh, Rajasthan, India. Currently she is an Assistant Prof. in Electronics and Communication department in Manikya Lal Verma Govt. Textile & Engg. College, Bhilwara, Rajasthan, India. Her area of interest are applied electronics, Microelectronics, VLSI, VHDL, Verilog, EDA, Analog CMOS designing and Low Power optimization. She has authored textbook: 1. EDA and Logic Synthesis, S.K. Kataria and Sons, Delhi, India, 2009 and books on VLSI Design and VHDL Design are under publishing process. She can be reached at sarita.mlvtec@yahoo.co.in YOGESH KUMAR is a student of final year B.Tech. Pursuing his Degree in Electronics & Communication from Rajasthan Technical University, Kota, Rajasthan, India .His area of interest are in VLSI design and MATLAB. He presented papers in National and International conferences on Analog CMOS design using signal processing and currently working on Image processing. He can be reached at yogeshkumar989@ymail.com