Professional Documents
Culture Documents
R Dhanabal
ABSTRACT
To represent very large or small values, large range is
required as the integer representation is no longer appropriate.
These values can be represented using the IEEE-754 standard
based floating point representation. This paper presents high
speed ASIC implementation of a floating point arithmetic unit
which can perform addition, subtraction, multiplication,
division functions on 32-bit operands that use the IEEE 7542008 standard. Pre-normalization unit and post normalization
units are also discussed along with exceptional handling. All
the functions are built by feasible efficient algorithms with
several changes incorporated that can improve overall latency,
and if pipelined then higher throughput. The algorithms are
modeled in Verilog HDL and have been implemented in
ModelSim.
Keywords
Floating point number, normalization, exceptions, latency,
etc.
1. INTRODUCTION
An arithmetic circuit which performs digital arithmetic
operations has many applications in digital coprocessors,
application specific circuits, etc. Because of the advancements
in the VLSI technology, many complex algorithms that
appeared impractical to put into practice, have become easily
realizable today with desired performance parameters so that
new designs can be incorporated [2]. The standardized
methods to represent floating point numbers have been
instituted by the IEEE 754 standard through which the
floating point operations can be carried out efficiently with
modest storage requirements,.
The three basic components in IEEE 754 standard floating
point numbers are the sign, the exponent, and the mantissa
[3]. The sign bit is of 1 bit where 0 refers to positive number
and 1 refers to negative number [3]. The mantissa, also called
significand which is of 23bits composes of the fraction and a
leading digit which represents the precision bits of the number
[3] [2]. The exponent with 8 bits represents both positive and
negative exponents. A bias of 127 is added to the exponent to
get the stored exponent [2]. Table 1 show the bit ranges for
single (32-bit) and double (64-bit) precision floating-point
values [2].
The value of binary floating point representation is as follows
where S is sign bit, F is fraction bit and E is exponent field.
Value of a floating point number= (-1)S x val(F) x 2val(E)
Sign
1[31]
Exponent
8[30-23]
Fraction
23[22-00]
Bias
127
1[63]
11[62-52]
52[51-00]
1023
2. ARCHITECTURE AND
METHODOLOGY
The FPU of a single precision floating point unit that performs
add, subtract, multiply, divide functions is shown in figure 1
[1]. Two pre-normalization units for addition/subtraction and
multiplication/division operations has been given[1]. Post
normalization unit also has been given that normalizes the
mantissa part[2]. The final result can be obtained after postnormalization. To carry out the arithmetic operations, two
IEEE-754 format single precision operands are considered.
Pre-normalization of the operands is done. Then the selected
operation is performed followed by post-normalizing the
output obtained .Finally the exceptions occurred are detected
and handled using exceptional handling. The executed
operation depends on a two bit control signal (z) which will
determine the arithmetic operation is shown in table 2.
32
Operation
Addition
Subtraction
Multiplication
Division
3.3 Multiplication
3. 32
BIT
FLOATING
ARITHMETIC UNIT
3.1 Addition Unit
POINT
33
a1 = x0y0.
b1 = p1- z2 - z0
p1 = (x1 + x0)(y1 + y0)
Here c1, a1, p1 has been calculated using bit pair recoding
algorithm. Radix-4 modified booth encoding has been used
which allows for the reduction of partial product array by half
[n/2]. The bit pair recoding table is shown in table 3. In the
implemented algorithm for each group of three bits (y2i1,
y2i, y2i_1) of multiplier, one partial product row is generated
according to the encoding in table 3. Radix-4 modified booth
encoding signals and their respective partial products has been
generated using the figures 4 and 5. For each partial product
row, figure 4 generates the one, two, and neg signals. These
values are then given to the logic in figure 5 with the bits of
the multiplicand, to produce the whole partial product array.
To prevent the sign extension the obtained partial products are
extended as shown in figure 6 and the the product has been
calculated using carry save select adder.
Table 3. Bit-Pair Recoding [11]
BIT
PATTERN
0
0 0
0
0
0
1
1
1
1
0
1
1
0
0
1
1
1
0
1
0
1
0
1
OPERATION
NO
OPERATION
1xa
2xa-a
2xa
-2xa
-2xa+a
-1xa
NO
OPERATION
prod=prod+a;
prod=prod+a;
prod=prod+2a;
prod=prod-2a;
prod=prod-a;
prod=prod-a;
34
4. RESULTS
4.1 Addition Unit
The single precision addition operation has been
implementation in modelsim for the inputs, input1=25.0 and
input2=4.5 which is input2=4.5 shown in figure 8 for which
result has been obtained as 29.5
35
5. REFERENCES
[1] Rudolf Usselmann, Open Floating Point Unit, The Free
IP Cores Projects.
[2]
36