You are on page 1of 13

1

ECEN 4233 Implentation of Goldschmidts algorithm for 16 bit division and square root
Cale Spratt and Jeremy Storm, Oklahoma State University
Abstract Division and square root are common functions used in digital systems such as microprocessors. Microprocessors that have hardware to perform division and square root are able to do those functions with fewer clock cycles, and more power effictiently. This implementation utilizes a 36 bit multiplication unit to implement the Goldschmidt algorithm with enough precision to give a large amount of accuracy, and it also utilizes a two stage pipeline in order to increase the efficiency of the implantation. Index TermsGoldschmidts Algorithms, Carry Propagate Adder, Full Adder, Half Adder, Carry Save Array Multiplier

1 INTRODUCTION In this implemtation, the designed hardware can perform these operations:
a) Division b) Square Root c) Multiplication Division and square root are common mathematical functions. Due to the importance of the two functions, it is beneficial to create an implementation that runs efficiently and that produces accurate, reliable results after each use. To decrease the amount of clock cycles that are required to perform division or square root, the design has been pipelined, allowing for two different divisions or or square roots to be performed at the same time. Implementing Goldschmidts algorithms requires the use of a multiplier, and because of this, the designed hardware can also implement multiplication, with the addition of a few control lines, allowing for this module to be even more functional. Most applications that utilize any division method are dependent upon the precision expectations that the hardware outputs. For our implementation we assumed lower precision with a sign bit, integer bit, and 14-bits of precision for values falling between [1,2) and (-2,-1]. If our design had requested a larger division subset then we would have expanded the precision and implemented a modified approximation protocol to account for more effective number of interations. If a division is to occur then the approximation

muxes with select the corresponding sign value of 0.75 or -0.75, otherwise a square root is assumed and the value will be 0.833. It is very important to select the correct approximation to create the proper division. The value must always fall to the left of the quotient when on an exponentially decreasing graph.

2 DIVISION
2.1 Goldschmidt Method

The Goldschmidt method is a method for approximating the zeroes of a real function. This method calculates Q = N/D. It is an iterative method that successively gives a better approximation after each iteration. The amount of bits of error grows quadratically after each successive iteration. The equation for determining the quotient is

K1 is obtained from utilizing the intial approximation that is provided from the mux unit dependent upon mathematical operation.

The number of iterations needed is equal to (n/log_2(radix)), where n is the number of bits of precision.

HIGH SPEED PROJECT SPRAT, STORM

3 Hardware Modules

001 010 011 100 101 110 111

Regc ---D Regd IA --N

A. Initial Approximation

The initial approximation (IA) module consists of 3 16-bit mux units that cascade a combination of signed approximations based on the mathematical function requested. The inputs values are between [1,2) or (-2,1]. There are only three constant approximations of 0.75, -0.75, and 0.833 where the first two correspond to specified signed divisions. The value 0.833 is the approximation used for a square root operation. A 2-bit selection input is provided to the cascaded muxes to select the operational approximation for calculation.
B. Multiplexers

Multiplexor B is implemented to select a 16-bit output value from inputs of initial approximation, register A and register B. This mux has the intented operation of provide the second multiplication value for our CSAM multiplier. The logic table for multiplexor B is listed below. Mux Select (2 bits) 00 01 10 11 Output (16 bits) Rega Regb Rega IA

We have a total of three multiplexor sets that assist in implementing the expected mathematical operation for each interation of the pipeline stages. Multiplexor A is used to delinieate between the 16-bit values from the intitial approximation unit, inputs N and D, and the two register units C and D. A 3-bit mux selection input determines the 16-bit mux output value that is propagated to our multiplier. All values passed to the multiplier through the muxes will be spliced with two additional bits for higher precision during multiplication. The table below provides all selection bit operations for multiplexor A.

Multiplexor Twos was devoted to propagating either a 16-bit two or three to the 2s complement module in order to perform either a division or square root. The table for the logic is provided below. Mux Select (1 bit) 0 1 Output (16 bits) 16h4000 16h6000

C. Signed Carry Save Array Multiplier (CSAM)

Mux Select (3 bits) 000

Output (16 bits) ---

The Carry Save Array Multiplier (CSAM) is a pipelined module to give the external register units the ability to store the appropriate multiplication value. Our multiplier was instantiated to be 36-bits with two 18-bit

SPRAT, STORM HIGH SPEED PAPER

inputs from pre-module muxes. This multiplier gives one integer and sign bits, with 16-bits of precision. Three 18-bit register modules that incorporate latch control were placed between the multiplication array and the Carry Propagate Adders (CPA). This in required to allow the expected pipelining capability. These registers store the carries and sum values for propagation into the CPAs. The implementation was developed to utilize 16-bit external registers so all the inputs and outputs have to be extended or spliced to meet the processing requirements. The most-significant two bits are always removed following the CPA sum being calculated. In other words we have performed the same function as a rounding unit would.

value. After the subtraction is complete then the additional integer bit is removed before propagating the output.

D. 2s Complement Module

The complement subtractor has to be integrated to perform both division and square root functions. An input mux determined the complement value designated for the intended mathematical operation. For the division operation, the 2s complement logic uses a 16bit value of 2 that implements as an output from the 2s selection mux. A 16-bit value of 3 will be implemented for a square root operation. Our 2s complement module has 16-bit inputs and outputs but they have to be modified to include one additional integer bit for the multiplier value being utilized. We concatenate one bit after the left-most bit for the multiplier and one bit to the right-most bit of the subtractor
3. State Table 3.1 State Table for Division

HIGH SPEED PAPER SPRAT, STORM

CLK mux_selecta mux_selectb mux_twos_select rega_out regb_out regc_out regd_out rega_load regb_load regc_load regd_load 6 null null 1 7 100 00 1

0 111 11 1 0 0 0 N*IA 0 0 0 0 0 8

1 111 11 1 0 2-K*IA 0 N*IA 0 D*IA 0 0 1 0

2 110 null 11 null 1 2-K*IA 0 N*IA D*IA 1 0 0 1 9 null null null null

4 100 00 1 2-D*IA

5 011 00 1

2-D*K0*K1 0 0 0 N*K0*K1 N*K0*K1 D*IA D*K0*K1 0 0 1 0 10 100 00 1 11 011 00 1 1 0 0 1

011 null 00 null 1

22-D*K0*K1 2-D*K0*K1 D*K0*K1*K2 0 0 0 N*K0*K1 N*K0*K1*K2 N*K0*K1*K2 D*K0*K1 D*K0*K1 D*K0*K1*K2 null null null null 12 null null 1 0 0 1 0 13 100 00 1 1 0 0 1

22D*K0*K1*K2 2-D*K0*K1*K2 D*K0*K1*K2*K3 0 0 0 N*K0*K1*K2 N*K0*K1*K2*K3 N*K0*K1*K2*K3 D*K0*K1*K2 D*K0*K1*K2 D*K0*K1*K2*K3 null null null null 14 11 00 1 0 0 1 0 1 0 0 1

2D*K0*K1*K2*K3 2-D*K0*K1*K2*K3

2D*K0*K1*K2*K3*K4

AUTHOR: TITLE

0 0 0 N*K0*K1*K2*K3 N*K0*K1*K2*K3*K4 N*K0*K1*K2*K3*K4 D*K0*K1*K2*K3 D*K0*K1*K2*K3 D*K0*K1*K2*K3*K4 null null null null
3.2 State Table for Square Root

0 0 1 0

1 0 0 1

After every three iterations, the 16-bit output from the 2s complement module will be shifted by one bit before propagating to the mux to create the division necessary for a square root.

CLK mux_selecta mux_selectb mux_twos_select rega_out regb_out regc_out regd_out rega_load regb_load regc_load regd_load

0 111 11 0 0 0 0 N*IA 0 0 0 0 0

1 111 11 0 0 3-K*IA 0 N*IA 0 D*IA 0 0 1 0

2 110 null 11 null 0 3-K*IA 0 N*IA D*IA 1 0 0 1 null null null null

4 100 00 0 3-D*IA

5 011 00 0

3-D*K0*K1 0 0 0 N*K0*K1 N*K0*K1 D*IA D*K0*K1 0 0 1 0 1 0 0 1

6 null null 0

7 100 00 0

8 011 null 00 null 0

10 100 00 0

11 011 00 0

3-D*K0*K1 3-D*K0*K1 3-D*K0*K1*K2 3-D*K0*K1*K2 3-D*K0*K1*K2 3-D*K0*K1*K2*K3 0 0 0 0 0 0 N*K0*K1 N*K0*K1*K2 N*K0*K1*K2 N*K0*K1*K2 N*K0*K1*K2*K3 N*K0*K1*K2*K3 D*K0*K1 D*K0*K1 D*K0*K1*K2 D*K0*K1*K2 D*K0*K1*K2 D*K0*K1*K2*K3

HIGH SPEED PAPER SPRAT, STORM

null null null null 12 null null 0

0 0 1 0 13 100 00 0

1 0 0 1

null null null null 14 11 00 0

0 0 1 0

1 0 0 1

3-D*K0*K1*K2*K3 3-D*K0*K1*K2*K3 3-D*K0*K1*K2*K3*K4 0 0 0 N*K0*K1*K2*K3 N*K0*K1*K2*K3*K4 N*K0*K1*K2*K3*K4 D*K0*K1*K2*K3 D*K0*K1*K2*K3 D*K0*K1*K2*K3*K4 null null null null
3 Error Analysis

0 0 1 0

1 0 0 1

1 ERROR ANALYSIS
1.1 Error Analysis for Division The analysis for division and square root was performed using the excel spreadsheets provided in the course.

2. 1.923 N 1.013 D 1.923 N 1.523 D 1.262639527 N/D q*K 1.44225 1.237089938 1.26212253 1.262639316 1.262639527 1.262639527 r*K 1.14225 0.979764938 0.999590542 0.999999832 1 1 2-D*Xi 0.85775 1.020235063 1.000409458 1.000000168 1 1 TRUE Error #bits 1.262639527 0.179610473 -2.477056622 1.262639527 0.02554959 -5.290556064 1.262639527 0.000516998 -10.91755495 1.262639527 2.11689E-07 -22.17155272 1.262639527 3.59712E-14 -44.66015 1.262639527 6.66134E-16 -50.4150375 IA 0.75

IA

0.75

AUTHOR: TITLE

1.898321816 N/D q*K 1.44225 1.788750563 1.891997357 1.898300746 1.898321816 1.898321816 r*K 0.75975 0.942279938 0.996668394 0.9999889 1 1 2-D*Xi 1.24025 1.057720063 1.003331606 1.0000111 1 1 TRUE Error #bits 1.898321816 0.456071816 -1.132667075 1.898321816 0.109571254 -3.190058739 1.898321816 0.00632446 -7.304842067 1.898321816 2.10706E-05 -15.53440872 1.898321816 2.33875E-10 -31.99354105 1.898321816 2.22045E-16 -52

3.

-1.012 N 1.9123 D -0.529205669 N/D q*K -0.759 -0.429423225 -0.510391554 -0.528536796 -0.529204823 -0.529205669 r*K 1.434225 0.811448649 0.964448388 0.998736083 0.999998403 1

IA

0.75

2-D*Xi 0.565775 1.188551351 1.035551612 1.001263917 1.000001597 1

TRUE Error #bits -0.529205669 0.229794331 -2.121584885 -0.529205669 0.099782444 -3.32507019 -0.529205669 0.018814115 -5.7320408 -0.529205669 0.000668872 -10.54598202 -0.529205669 8.45399E-07 -20.17386446 -0.529205669 1.35048E-12 -39.4296699

4. 1.012 N 1.9123 D 0.529205669 N/D q*K 0.759 0.429423225 0.510391554 0.528536796 0.529204823 0.529205669 r*K 1.434225 0.811448649 0.964448388 0.998736083 0.999998403 1 2-D*Xi 0.565775 1.188551351 1.035551612 1.001263917 1.000001597 1 TRUE Error #bits 0.529205669 0.229794331 -2.121584885 0.529205669 0.099782444 -3.32507019 0.529205669 0.018814115 -5.7320408 0.529205669 0.000668872 -10.54598202 0.529205669 8.45399E-07 -20.17386446 0.529205669 1.35048E-12 -39.4296699 IA 0.75

In all the above cases, the amount of error was greater than 16 bits after five iterations, showing that the the design should be accurate to the amount of bits being used on the design.

HIGH SPEED PAPER SPRAT, STORM

1.2 Error Analysis for Square Root

1.

1.231 N 1.231 D 1 N/D

IA

0.853553391

q*K r*K (3-D*Xi)/2 1.050724224 0.896849224 1.051575388 1.104915733 0.991745555 1.004127223 1.109475967 0.999948757 1.000025621 1.109504393 0.999999998 1.000000001 1.109504394 1 1 1.109504394 1 1

TRUE Error #bits 1.109504394 0.05878017 -4.088526657 1.109504394 0.00458866 -7.767711238 1.109504394 2.84273E-05 -15.10236562 1.109504394 1.09252E-09 -29.76969629 1.109504394 2.22045E-16 -52 1.109504394 2.22045E-16 -52

2.

1.99 N 1.99 D 1 N/D

IA

0.853553391

q*K r*K (3-D*Xi)/2 1.698571247 1.449821247 0.775089376 1.316544529 0.870999747 1.064500127 1.401461818 0.986982526 1.006508737 1.410583564 0.999872358 1.000063821 1.410673589 0.999999988 1.000000006 1.410673598 1 1

TRUE Error #bits 1.410673598 0.287897649 -1.796372086 1.410673598 0.094129069 -3.409215861 1.410673598 0.00921178 -6.762304257 1.410673598 9.00338E-05 -13.43917395 1.410673598 8.61919E-09 -26.78980037 1.410673598 2.22045E-16 -52

AUTHOR: TITLE

3.

1.01 N 1.01 D 1 N/D

IA

0.853553391

q*K r*K (3-D*Xi)/2 0.862088924 0.735838924 1.132080538 0.975954093 0.943055834 1.028472083 1.003741539 0.997521859 1.00123907 1.004985246 0.99999539 1.000002305 1.004987562 1 1 1.004987562 1 1

TRUE Error #bits 1.004987562 0.142898638 -2.806935933 1.004987562 0.029033469 -5.106139236 1.004987562 0.001246023 -9.6484538 1.004987562 2.31634E-06 -18.7197191 1.004987562 8.00848E-12 -36.86160819 1.004987562 2.22045E-16 -52

4. roughly 39 for division and 52 for the square root oper1.7532 N 1.7532 D 1 N/D q*K r*K (3-D*Xi)/2 1.496449804 1.277299804 0.861350098 1.288967185 0.947659369 1.026170316 1.322699864 0.997909496 1.001045252 1.324082418 0.99999672 1.00000164 1.324084589 1 1 1.324084589 1 1 TRUE Error #bits 1.324084589 0.172365215 -2.536459442 1.324084589 0.035117404 -4.831669987 1.324084589 0.001384726 -9.496183876 1.324084589 2.17146E-06 -18.81290355 1.324084589 5.34195E-12 -37.44577091 1.324084589 2.22045E-16 -52 ations. Our state tables show the interation process by incorporating the mux selection values, the register selection values, and the equations of the division/squareroot being held in the designated register after each individual interation. We added a stall into the program to allow the multiplier time to store into internal registers so that overflowing multiplications wouldnt occur and so we could successfully implement pipelined hardware. IA 0.853553391

After five iterations the number of bits of error exceeds 16 bits in all of the simulations, showing that five iterations is sufficient to produce enough accuracy.

4. CONCLUSION The process of analyzing the Goldschmidt division and square root implementation has been successful by providing indepth interative representation of the mathematical process flow. The error analysis of our design shows that we need to perform a minimum of 5 interations in general and that our fractional bits are

10

HIGH SPEED PAPER SPRAT, STORM

4 APPENDICES 4.1 Specialized Full Adder

AUTHOR: TITLE

11

4.2 Carry Save Array Multiplier

4.3 Subtractor

12

HIGH SPEED PAPER SPRAT, STORM

4.4 Overall Project Design

4.5 MUXa

AUTHOR: TITLE

13

REFERENCES
[1] [2] High Speed Computer Arithmetic Class, Dr. Stine, Spring 2013 M. D. Ercegovac, J Muller Design of a Complex Divider aComputer Science Department, University of California, Los Angeles, California, U.S.A Aswin Ramachandran, ECEN 5060- Final Project Implementation of Goldschmidt Algorithm for Division, Squa`re root and Inverse Square root Graduate Student, Oklahoma State University, 2006 Javier Hormigo, Julio Villalba and Emilio L. Zapata, Cordic Algorithm with digits skipping Dept. Computer Architecture. University of Malaga (SPAIN)

[3]

[4]

You might also like