Reducing Computation Time For Short Bit Width Twos Compliment Multiplier

REDUCING THE COMPUTATION TIME IN (SHORT BIT-
WIDTH) TWO'S COMPLEMENT MULTIPLIERS

ABSTRACT:
Two's complement multipliers are important for a wide range of applications. In this
paper, we present a technique to reduce by one row the maximum height of the partial product
array generated by a radix-4 Modified ooth !ncoded multiplier, without any increase in the
delay of the partial product generation stage. This reduction may allow for a faster compression
of the partial product array and regular layouts. This technique is of particular interest in all
multiplier designs, but especially in short bit-width two's complement multipliers for high-
performance embedded cores. The proposed method is general and can be extended to higher
radix encodings, as well as to any si"e square and m times n rectangular multipliers. #e
e$aluated the proposed approach by comparison with some other possible solutions% the results
based on a rough theoretical analysis and on logic synthesis showed its efficiency in terms of
both area and delay.
Introducton !"out #$r%o&
O'$r'$(:
&ardware description languages such as 'erilog differ from software programming
languages because they include ways of describing the propagation of time and signal
dependencies (sensiti$ity). There are two assignment operators, a bloc*ing assignment (+), and a
non-bloc*ing (,+) assignment. The non-bloc*ing assignment allows designers to describe a
state-machine update without needing to declare and use temporary storage $ariables (in any
general programming language we need to define some temporary storage spaces for the
operands to be operated on subsequently% those are temporary storage $ariables). -ince these
concepts are part of 'erilog's language semantics, designers could quic*ly write descriptions of
large circuits, in a relati$ely compact and concise form. .t the time of 'erilog's introduction
(/014), 'erilog represented a tremendous producti$ity impro$ement for circuit designers who
were already using graphical schematic capture software and specially-written software
programs to document and simulate electronic circuits.
The designers of 'erilog wanted a language with syntax similar to the 2
programming language, which was already widely used in engineering software de$elopment.
'erilog is case-sensiti$e, has a basic preprocessor (though less sophisticated than that of .3-I
24255), and equi$alent control flow *eywords (if4else, for, while, case, etc.), and compatible
operator precedence. -yntactic differences include $ariable declaration ('erilog requires bit-
widths on net4reg types
)
, demarcation of procedural bloc*s (begin4end instead of curly braces 67),
and many other minor differences.
. 'erilog design consists of a hierarchy of modules. Modules encapsulate design
hierarchy, and communicate with other modules through a set of declared input, output, and
bidirectional ports. Internally, a module can contain any combination of the following8
net4$ariable declarations (wire, reg, integer, etc.), concurrent and sequential statement bloc*s,
and instances of other modules (sub-hierarchies). -equential statements are placed inside a
begin4end bloc* and executed in sequential order within the bloc*. ut the bloc*s themsel$es are
executed concurrently, qualifying 'erilog as a dataflow language.
'erilog's concept of 'wire' consists of both signal $alues (4-state8 9/, :, floating,
undefined9), and strengths (strong, wea*, etc.) This system allows abstract modeling of shared
signal-lines, where multiple sources dri$e a common net. #hen a wire has multiple dri$ers, the
wire's (readable) $alue is resol$ed by a function of the source dri$ers and their strengths.

. subset of statements in the 'erilog language is synthesi"able. 'erilog modules that
conform to a synthesi"able coding-style, *nown as ;T< (register transfer le$el), can be
physically reali"ed by synthesis software. -ynthesis-software algorithmically transforms the
(abstract) 'erilog source into a net-list, a logically-equi$alent description consisting only of
elementary logic primiti$es (.3=, >;, 3>T, flip-flops, etc.) that are a$ailable in a specific
?@A. or '<-I technology. ?urther manipulations to the net-list ultimately lead to a circuit
fabrication blueprint (such as a photo mas* set for an .-I2, or a bit-stream file for an ?@A.).
#$r%o& -HDL H)tor*
B$&nnn&:
'erilog was the first modern hardware description language to be in$ented. It was
created by @hil Moorby and @rabhu Aoel during the winter of /01B4/014. The wording for this
process was 9.utomated Integrated =esign -ystems9 (later renamed to Aateway =esign
.utomation in /01C) as a hardware modeling language. Aateway =esign .utomation was
purchased by 2adence =esign -ystems in /00:. 2adence now has full proprietary rights to
Aateway's 'erilog and the 'erilog-D<, the &=<-simulator that would become the de-facto
standard (of 'erilog logic simulators) for the next decade.. >riginally, 'erilog was intended to
describe and allow simulation% only afterwards was support for synthesis added.
#$r%o&-+,:
#ith the increasing success of '&=< at the time, 2adence decided to ma*e the
language a$ailable for open standardi"ation. 2adence transferred 'erilog into the public domain
under the >pen 'erilog International (>'I) (now *nown as .ccellera) organi"ation. 'erilog was
later submitted to I!!! and became I!!! -tandard /BE4-/00C, commonly referred to as
'erilog-0C.
In the same time frame 2adence initiated the creation of 'erilog-. to put standards
support behind its analog simulator -pectre. 'erilog-. was ne$er intended to be a standalone
language and is a subset of 'erilog-.M- which encompassed 'erilog-0C.
#$r%o& -../:
!xtensions to 'erilog-0C were submitted bac* to I!!! to co$er the deficiencies
that users had found in the original 'erilog standard. These extensions became I!!! -tandard
/BE4-F::/ *nown as 'erilog-F::/.
'erilog-F::/ is a significant upgrade from 'erilog-0C. ?irst, it adds explicit support
for (F's complement) signed nets and $ariables. @re$iously, code authors had to perform signed-
operations using aw*ward bit-le$el manipulations (for example, the carry-out bit of a simple 1-
bit addition required an explicit description of the oolean-algebra to determine its correct
$alue). The same function under 'erilog-F::/ can be more succinctly described by one of the
built-in operators8 5, -, 4, G, HHH. . generate4end-generate construct (similar to '&=<'s
generate4end-generate) allows 'erilog-F::/ to control instance and statement instantiation
through normal decision-operators (case4if4else). Ising generate4end-generate, 'erilog-F::/ can
instantiate an array of instances, with control o$er the connecti$ity of the indi$idual instances.
?ile I4> has been impro$ed by se$eral new system-tas*s. .nd finally, a few syntax additions
were introduced to impro$e code-readability (e.g. always JG, named-parameter o$erride, 2-
style function4tas*4module header declaration).
'erilog-F::/ is the dominant fla$or of 'erilog supported by the maKority of
commercial !=. software pac*ages.
Introducton !"out Mu%t0%c!ton:
Multiplication (often denoted by the cross symbol 9L9) is the mathematical operation
of scaling one number by another. It is one of the four basic operations in elementary arithmetic
(the others being addition, subtraction and di$ision).
Mu%t0%c!ton:
If a positional numeral system is used, a natural way of multiplying numbers is
taught in schools as long multiplication, sometimes called grade-school multiplication8 multiply
the multiplicand by each digit of the multiplier and then add up all the properly shifted results. It
requires memori"ation of the multiplication table for single digits.
This is the usual algorithm for multiplying larger numbers by hand in base /:.
2omputers normally use a $ery similar shift and add algorithm in base F. . person doing long
multiplication on paper will write down all the products and then add them together% an abacus-
user will sum the products as soon as each one is computed.
E1!20%$:
This example uses long multiplication to multiply FB,0C1,FBB (multiplicand) by
C,1B: (multiplier) and arri$es at /B0,EME,401,B0: for the result (product).
FB0C1FBB
C1B: L
------------
:::::::: (+ FB,0C1,FBB L :)
M/1M4E00 (+ FB,0C1,FBB L B:)
/0/EEC1E4 (+ FB,0C1,FBB L 1::)
//0M0//EC (+ FB,0C1,FBB L C,:::)
------------
/B0EME401B0: (+ /B0,EME,401,B0: )
Mu%t0%c!ton !%&ort32:
. multiplication algorithm is an algorithm (or method) to multiply two numbers.
=epending on the si"e of the numbers, different algorithms are in use. !fficient multiplication
algorithms ha$e existed since the ad$ent of the decimal system
Types of Multiplication .lgorithms
/. oothNs .lgorithm
F. Modified oothNs .lgorithm
B. #allace Tree .lgorithm
Boot3') A%&ort32:
ooth's algorithm is a multiplication algorithm which wor*ed for two's
complement numbers. It is similar to our paper-pencil method, except that it loo*s for the current
as well as pre$ious bit in order to decided what to do. &ere are steps
If the current multiplier digit is / and earlier digit is : (i.e. a /: pair) shift and sign extend
the multiplicand, subtract with pre$ious result.
If it is a :/ pair, add to the pre$ious result.
If it is a :: pair, or // pair, do nothing.
<et's loo* at few examples.
4 bits
://: ,- E
x ::/: ,- F
-------------
::::::::
- ://:
--------------
////:/::
5 ://:
--------------
(/) :::://:: ,- /F (o$erflow bit ignored)
1 bits
In ooth's algorithm, if the multiplicand and multiplier are n-bit two's complement
numbers, the result is considered as Fn-bit two's complement $alue. The o$erflow bit (outside Fn
bits) is ignored.
The reason that the abo$e computation wor*s is because
://: x ::/: + ://: x (-::/: 5 :/::) + -://:: 5 ://::: + //::.
!xample F8
::/:
x ://:
------------
::::::::
- ::/:
-------------
//////::
5 ::/:
-------------
(/) :::://::
In this we ha$e computed
::/: x ://: + ::/: x ( -::/: 5 /:::) + - ::/:: 5 ::/:::: + //::
!xample B, (-C) x (-B)8
/:// -H -C (4-bit two's complement)
x //:/ -H -B
-----------
::::::::
- /////:// (notice the sign extension of multiplicand)
------------
:::::/:/
5 ////://
-------------
/////://
- ///://
-------------
:::://// -H 5/C
. long example8
/::///:: ,- -/::
x ://:::// ,- 00
--------------------
:::::::: ::::::::
- //////// /::///::
--------------------
:::::::: ://::/::
5 ///////: :///::
--------------------
///////: //:/:/::
- ////::// /::
--------------------
::::/:// :/:/:/::
5 //::///: :
--------------------
//://::/ :/:/:/:: ,- -00::
3ote that the multiplicand and multiplier are 1-bit two's complement number,
but the result is understood as /E-bit two's complement number. e careful about the proper
alignment of the columns. /: pair causes a subtraction, aligned with /, :/ pair causes an
addition, aligned with :. In both cases, it aligns with the one on the left. The algorithm starts with
the :-th bit. #e should assume that there is a (-/)-th bit, ha$ing $alue :.
ooth .lgorithm .d$antages and =isad$antages
=epends on the architecture
@otential ad$antage8 might reduce the O of /Ns
in multiplier
In the multipliers that we ha$e seen so far8
=oesnNt sa$e in speed
(still ha$e to wait for the critical path, e.g., the shift-add delay in sequential
multiplier)
Incr
!ases area8 recoding circuitry .3= subtraction
Mod4$d Boot3:
ooth F modified to produce at most n4F5/ partial products.
.lgorithm8 (for unsigned numbers)
/. @ad the <- with one "ero.
F. @ad the M- with F "eros if n is e$en and / "ero if n is odd.
B. =i$ide the multiplier into o$erlapping groups of B-bits.
4. =etermine partial product scale factor from modified booth F encoding table.
C. 2ompute the Multiplicand Multiples
E. -um @artial @roducts
2an encode the digits by loo*ing at three bits at a time
ooth recoding table8
/. Must be able to add multiplicand times PF, -/, :, / and F
F. -ince ooth recoding got rid of BNs, generating partial products is not that hard
(shifting and negating)
I5/ i i-/ add
: : : :GM
: : / /GM
: / : /GM
: / / FGM
/ : : PFGM
/ : / P/GM
/ / : P/GM
/ / / :GM
ooth F modified to produce at most n4F5/ partial products.
.lgorithm8 (for unsigned numbers)
/. @ad the <- with one "ero.
F. If n is e$en donNt pad the M- ( n4F @@Ns) and if n is odd sign extend the M- by
/ bit ( n5/4F @@Ns).
B. =i$ide the multiplier into o$erlapping groups of B-bits.
4. =etermine partial product scale factor from modified booth F encoding table.
C. 2ompute the Multiplicand Multiples
E. -um @artial @roducts
Interpretation of the ooth recoding table8
i5/ i i-/ add !xplanation
: : : :GM 3o string of /Ns in sight
: : / /GM !nd of a string of /Ns
: / : /GM Isolated /
: / / FGM !nd of a string of /Ns
/ : : PFGM eginning of a string of /Ns
/ : / P/GM !nd one string, begin new one
/ / : P/GM eginning of a string of /Ns
/ / / :GM 2ontinuation of string of /Ns
Arouping multiplier bits into pairs
>rthogonal idea to the ooth recoding
;educes the num of partial products to half
If ooth recoding not used ha$e to be able to multiply by B (hard8 shift5add)
.pplying the grouping idea to ooth
Modified ooth ;ecoding (!ncoding)
#e already got rid of sequences of /Ns
no mult by B
Qust negate, shift once or twice
Ises high-radix to reduce number of intermediate addition operands
2an go higher8 radix-1, radix-/E
;adix-1 should implement GB, G-B, G4, G-4
;ecoding and partial product generation becomes more complex
2an automatically ta*e care of signed multiplication
W!%%!c$ tr$$:
. #allace tree is an efficient hardware implementation of a digital circuit that
multiplies two integers, de$ised by an .ustralian 2omputer -cientist 2hris #allace in /0E4.
R/S
The #allace tree has three steps8
/. Multiply (that is - .3=) each bit of one of the arguments, by each bit of the other,
yielding n
F
results. =epending on position of the multiplied bits, the wires carry different
weights, for example wire of bit carrying result of aFbB is BF (see explanation of weights
below).
F. ;educe the number of partial products to two by layers of full and half adders.
B. Aroup the wires in two numbers, and add them with a con$entional adder.
RFS

The second phase wor*s as follows. .s long as there are three or more wires with
the same weight add a following layer8
Ta*e any three wires with the same weights and input them into a full adder. The result
will be an output wire of the same weight and an output wire with a higher weight for
each three input wires.
If there are two wires of the same weight left, input them into a half adder.
If there is Kust one wire left, connect it to the next layer.
The benefit of the #allace tree is that there are only O(log n) reduction layers, and
each layer has O(/) propagation delay. .s ma*ing the partial products is O(/) and the final
addition is O(log n), the multiplication is only O(log n), not much slower than addition (howe$er,
much more expensi$e in the gate count). 3ai$ely adding partial products with regular adders
would require O(log
F
n) time. ?rom a complexity theoretic perspecti$e, the #allace tree
algorithm puts multiplication in the class 32
/
.
These computations only consider gate delays and don't deal with wire delays,
which can also be $ery substantial. The #allace tree can be also represented by a tree of B4F or
44F adders. It is sometimes combined with ooth encoding.
W$&3t) $10%!n$d
The weight of a wire is the radix (to base F) of the digit that the wire carries. In general,
anbm P ha$e indexes of n and m% and since F
n
F
m
+ F
n 5 m
the weight of anbm is F
n 5 m
.
E1!20%$
n + 4, multiplying aBaFa/a: by bBbFb/b:8
/. ?irst we multiply e$ery bit by e$ery bit8
o
weight / - a:b:
o
weight F - a:b/, a/b:
o
weight 4 - a:bF, a/b/, aFb:
o
weight 1 - a:bB, a/bF, aFb/, aBb:
o
weight /E - a/bB, aFbF, aBb/
o
weight BF - aFbB, aBbF
o
weight E4 - aBbB
F. ;eduction layer /8
o
@ass the only weight-/ wire through, output8 / weight-/ wire
o
.dd a half adder for weight F, outputs8 / weight-F wire, / weight-4 wire
o
.dd a full adder for weight 4, outputs8 / weight-4 wire, / weight-1 wire
o
.dd a full adder for weight 1, and pass the remaining wire through, outputs8 F
weight-1 wires, / weight-/E wire
o
.dd a full adder for weight /E, outputs8 / weight-/E wire, / weight-BF wire
o
.dd a half adder for weight BF, outputs8 / weight-BF wire, / weight-E4 wire
o
@ass the only weight-E4 wire through, output8 / weight-E4 wire
B. #ires at the output of reduction layer /8
o
weight / - /
o
weight F - /
o
weight 4 - F
o
weight 1 - B
o
weight /E - F
o
weight BF - F
o
weight E4 - F
4. ;eduction layer F8
o
.dd a full adder for weight 1, and half adders for weights 4, /E, BF, E4
C. >utputs8
o
weight / - /
o
weight F - /
o
weight 4 - /
o
weight 1 - F
o
weight /E - F
o
weight BF - F
o
weight E4 - F
o
weight /F1 - /
E. Aroup the wires into a pair integers and an adder to add them.
T(o5) co20%$2$nt:
The two's complement of a binary number is defined as the $alue obtained by
subtracting the number from a large power of two (specifically, from F
N
for an N-bit two's
complement). The two's complement of the number then beha$es li*e the negati$e of the original
number in most arithmetic, and it can coexist with positi$e numbers in a natural way.
. two's-complement system, or two's-complement arithmetic, is a system in which
negati$e numbers are represented by the two's complement of the absolute $alue%
R/S
this system is
the most common method of representing signed integers on computers.
RFS
In such a system, a
number is negated (con$erted from positi$e to negati$e or $ice $ersa) by computing its ones'
complement and adding one. .n 3-bit two's-complement numeral system can represent e$ery
integer in the range TF
3T/
to F
3T/
-/ while ones' complement can only represent integers in the
range T(F
3T/
T/) to F
3T/
T/
The two's-complement system has the ad$antage of not requiring that the addition
and subtraction circuitry examine the signs of the operands to determine whether to add or
subtract. This property ma*es the system both simpler to implement and capable of easily
handling higher precision arithmetic. .lso, "ero has only a single representation, ob$iating the
subtleties associated with negati$e "ero, which exists in ones'-complement systems.
REDUCING THE COMPUTATION TIME IN (SHORT BIT-
WIDTH) TWO'S COMPLEMENT MULTIPLIERS
/6 INTRODUCTION:
In multimedia, B= graphics and signal processing applications, performance, in most
cases, strongly depends on the effecti$eness of the hardware used for computing multiplications,
since multiplication is, besides addition, massi$ely used in these en$ironments. The high interest
in this application field is witnessed by the large amount of algorithms and implementations of
the multiplication operation, which ha$e been proposed in the literature (for a representati$e set
of references, see R/S). More specifically, short bit-width (1-/E bits) twoNs complement
multipliers with single-cycle throughput and latency ha$e emerged and become $ery important
building bloc*s for high-performance embedded processors and =-@ execution cores RFS, RBS. In
this case, the multiplier must be highly optimi"ed to fit within the required cycle time and power
budgets. .nother rele$ant application for short bit-width multipliers is the design of -IM= units
supporting different data formats RBS, R4S. In this case, short bit-width multipliers often play the
role of basic building bloc*s. TwoNs complement multipliers of moderate bit-width (less than BF
bits) are also being used massi$ely in ?@A.-. .ll of the abo$e translates into a high interest and
moti$ation on the part of the industry, for the design of high-performance short or moderate bit-
width twoNs complement multipliers.
The basic algorithm for multiplication is based on the well-*nown paper and pencil
approach R/S and passes through three main phases8 /) partial product (@@) generation, F) @@
reduction, and B) final (carry-propagated) addition. =uring @@ generation, a set of rows is
generated where each one is the result of the product of one bit of the multiplier by the
multiplicand. ?or example, if we consider the multiplication D U V with both D and V on n bits
and of the form xnW/ . . . D: and ynW/ . . . V:, then the i
th
row is, in general, a proper left shifting
of yiG D, i.e., either a string of all "eros when yi+ :, or the multiplicand D itself when yi+ /. In
this case, the number of @@ rows generated during the first phase is clearly n.

Modified ooth !ncoding (M!) is a technique that has been introduced to reduce the
number of @@ rows, still *eeping the generation process of each row both simple and fast
enough. >ne of the most commonly used schemes is radix-4 M!, for a number of reasons, the
most important being that it allows for the reduction of the si"e of the partial product array by
almost half, and it is $ery simple to generate the multiples of the multiplicand. More specifically,
the classic twoNs complement n G n bit multiplier using the radix-4 M! scheme, generates a @@
array with a maximum height of Rn4FS5/ rows, each row before the last one being one of theF
following possible $alues8 all "eros, 5-D%5-FD. The last row, which is due to the negati$e
encoding, can be *ept $ery simple by using specific techniques integrating twoNs complement
and sign extension pre$ention R/S.

The @@ reduction is the process of adding all @@ rows by using a compression tree RES, RMS.
-ince the *nowledge of intermediate addition $alues is not important, the outcome of this phase
is a result represented in redundant carry- sa$e form, i.e., as two rows, which allows for much
faster implementations. The final (carry-propagated) addition has the tas* of adding these two
rows and of presenting the final result in a non redundant form, i.e., as a single row.

In this wor*, we introduce an idea to o$erlap, to some extent, the @@ generation and the @@
reduction phases. >ur aim is to produce a @@ array with a maximum height of Rn4FS rows that is
then reduced by the compressor tree stage.F

.s we will see for the common case of $alues n which are power of two, the abo$e
reduction can lead to an implementation where the delay of the compressor tree is reduced by
one D>;F gate *eeping a regular layout. -ince we are focusing on small $alues of n and fast
single-cycle units, this reduction might be important in cases where, for example, a high
computation performance through the assembly of a large number of small processing units with
limited computation capabilities are required, such as 1 U 1 or /E U /E multipliers R1S.
. similar study aimed at the reduction of the maximum height to Rn4FS but using a
different approach has recentlyF presented interesting results in R0S and pre$iously, by the same
authors, in R/:S. Thus, in the following, we will e$aluate and compare the proposed approach
with the technique in R0S. .dditional details of our approach, besides the main results presented
here, can be found in R//S.

The paper is organi"ed as follows8 in -ection F, the multiplication algorithm based on M!
is briefly re$iewed and analy"ed. In -ection B, we describe related wor*s. In -ection 4, we
present our scheme to reduce the maximum height of the partial product array by one unit during
the generation of the @@ rows. ?inally, in -ection C, we pro$ide e$aluations and comparisons.
- 6MODI7IED BOOTH RECODED MULTIPLIERS:
In general, a radix- + F
b
M! leads to a reduction of the number of rows to about Rn4bS
while, on the other hand, it introduces the need to generate all the multiples of the multiplicand
D, at least from P4F G D to 4F G D. .s mentioned abo$e, radix-4 M! is particularly of
interest since, for radix-4, it is easy to create the multiples of the multiplicand :% 5-D% 5-FD. In
particular, 5-FD can be simply obtained by single left shifting of the corresponding terms 5-D. It
is clear that the M! can be extended to higher radices (see R/FS among others), but the
ad$antage of getting a higher reduction in the number of rows is paid for by the need to generate
more multiples of D. In this paper, we focus our attention on radix-4 M!, although the
proposed method can be easily extended to any radix- M! R//S.
?rom an operational point of $iew, it is well *nown that the radix-4 M! scheme consists
of scanning the multiplier operand with a three-bit window and a stride of two bits (radix-4). ?or
each group of three bits (yFi5/, yFi, yFi5/), only one partial product row is generated according to
the encoding in Table /. . possible implementation of the radix-4 M! and of the corresponding
partial product generation is shown in ?ig. /, which comes from a small adaptation of R/:, ?ig.
/FbS. ?or each partial product row, ?ig. /a produces the one, two, and neg signals. These signals
are then exploited by the logic in ?ig. /b, along with the appropriate bits of the multiplicand, in
order to generate the whole partial product array. >ther alternati$es for the implementation of the
recoding and partial product generation can be found in R/BS, R/4S, R/CS, among others.

.s introduced pre$iously, the use of radix-4 M! allows for the (theoretical) reduction
of the @@ rows to Rn4FS, with theF possibility for each row to host a multiple of yiG D, with yi X
6:,5-/,5-F7. #hile it is straightforward to generate the positi$e terms :, D, and FD at least
through a left shift of D, some attention is required to generate the terms -D and -FD which, as
obser$ed in Table /, can arise from three configurations of the yFi5/ , yFi , and yFi-/ bits. To a$oid
computing negati$e encodings, i.e., -D and -FD, the twoNs complement of the multiplicand is
generally used. ?rom a mathematical point of $iew, the use of twoNs complement requires
extension of the sign to the leftmost part of each partial product row, with the consequence of an
extra area o$erhead. Thus, a number of strategies for pre$enting sign extension ha$e been
de$eloped. ?or instance, the scheme in R/S relies on the obser$ation that
/-F54. The array resulting from the application of the sign extension pre$ention technique in R/S
to the partial product array of a 1 G 1 M! multiplier RCS is shown in ?ig. F.

The use of twoNs complement requires a neg signal (e.g., neg:, neg/, negF, and negB in
?ig. F) to be added in the <- position of each partial product row for generating the twoNs
complemented, as needed. Thus, although for a n Gn multiplier, only Rn4FS partial products are
generated, the maximum height of the partial product array is Rn4FS5/
#hen 4-to-F compressors are used, which is a widely used option because of the high
regularity of the resultant circuit layout for n power of two, the reduction of the extra row may
require an additional delay of two D>;F gates. y properly connecting partial product rows and
using a #allace reduction tree RMS, the extra delay can be further reduced to one D>;F R/ES, R/MS.
&owe$er, the reduction still requires additional hardware, roughly a row of n half adders. This
issue is of special interest when n is a power of two, which is by far a $ery common case, and the
multiplierNs critical path has to fit within the cloc* period of a high performance processor. ?or
instance, in the design presented in RFS, for n +/E, the maximum column height of the partial
product array is nine, with an equi$alent delay for the reduction of six D>;F gates R/ES, R/MS.
?or a maximum height of the partial product array of 1, the delay of the reduction tree would be
reduced by one D>;F gate R/ES, R/MS. .lternati$ely, with a maximum height of eight, it would be
possible to use 4 to F adders, with a delay of the reduction tree of six D>;F gates, but with a
$ery regular layout.
86 RELATED WOR9:
-ome approaches ha$e been proposed aiming to add the Rn4FS 5 / rows, possibly in the
same time as the Rn4FS rows. TheFF solution presented in R/4S is based on the use of different
types of counters, that is, it operates at the le$el of the @@ reduction phase. Yang and Aaudiot
propose a different approach in R0S that manages to achie$e the goal of eliminating the extra row
before the @@ reduction phase. This approach is based on computing the twoNs complement of the
last partial product, thus eliminating the need for the last neg signal, in a logarithmic time
complexity. . special tree structure (basically an incrementer implemented as a prefix tree R/1S)
is used in order to produce the twoNs complement (?ig. B), by decoding the M! signals through
a B-C decoder (?ig. 4a). ?inally, a row of 4-/ multiplexers with implicit "ero output/ is used (?ig.
4b) to produce the last partial product row directly in twoNs complement, without the need for the
neg signal. The goal is to produce the twoNs complement in parallel with the computation of
The partial products of the other rows with maximum o$erlap. In such a case, it is expected to
ha$e no or a small time penali"ation in the critical path. The architecture in R0S, R/1S is a
logarithmic $ersion of the linear method presented in R/0S and RF:S. #ith respect to R/0S, RF:S,
the approach in R0S is more general, and shows better adaptability to any word si"e. .n example
of the partial product array produced using the abo$e method is depicted in ?ig. C.
In this wor*, we present a technique that also aims at producing only Rn4FS rows, but by
relying on a differentF approach than R0S.
:6 BASIC IDEA:
The case of n G n square multipliers is quite common, as the case of n that is a power of
two. Thus, we start by focusing our attention on square multipliers, and then present the
extension to the general case of m G n rectangular multipliers.
:6/ S;u!r$ Mu%t0%$r):

The proposed approach is general and, for the sa*e of clarity, will be explained through
the practical case of 1 G 1 multiplications (as in the pre$ious figures). .s briefly outlined in the
pre$ious sections, the main goal of our approach is to produce a partial product array with a
maximum height of Rn4FS rows, without introducing anyF additional delay.
<et us consider, as the starting point, the form of the simplified array as reported in ?ig.
F, for all the partial product rows except the first one. .s depicted in ?ig. Ea, the first row is
temporarily considered as being split into two sub rows, the first one containing the partial
product bits (from right to left) from pp:: to pp1: bar and the second one with two bits set at
Zone[ in positions 0 and 1. Then, the bit negB related to the fourth partial product row, is mo$ed
to become a part of the second sub row. The *ey point of this
Zgraphical[ transformation is that the second sub row containing also the bit negB , can now be
easily added to the first sub row, with a constant short carry propagation of three positions
(further denoted as ZB-bits addition[), a $alue which is easily shown to be general, i.e.,
independent of the length of the operands, for square multipliers. In fact, with reference to the
notation of ?ig. E, we ha$e that
.s introduced abo$e, due to the particular $alue of the second operand, i.e., : / / : negB , in
R//S, we ha$e obser$ed that it requires a carry propagation only across the least-significant three
positions, a fact that can also be seen by the implementation shown in ?ig. M.
It is worth obser$ing that, in order not to ha$e delay penali"ations, it is necessary that
the generation of the other rows is done in parallel with the generation of the first row cascaded
by the computation of the bits qqM: qqE: in ?ig. Eb. In order to achie$e this, we
must simplify and differentiate the generation of the first row with respect to the other rows. #e
obser$e that the ooth recoding for the first row is computed more easily than for the other rows,
because the yW/ bit used by the M! is always equal to "ero. In order to ha$e a preliminary
.nalysis which is possibly independent of technological details, we refer to the circuits in the
following figures8
?ig. /, slightly adapted from R/:, ?ig. /FS, for the partial product generation using M!%
?ig. M, obtained through manual synthesis (aimed at modularity and area reduction without
compromising the delay), for the addition of the last neg bit to the three most significant bits of
the first row%
?ig. 1, obtained by simplifying ?ig. / (since, in the first row, it is yFi-/ + :), for the partial
product generation of the first row only using M!% and
?ig. 0, obtained through manual synthesis of a combination of the two parts of ?ig. 1 and
aimed at decreasing the delay of ?ig. 1 with no or $ery small area increase, for the partial product
generation of the first row only using M!.
In particular, we obser$e that, by direct comparison of ?igs. / and 1, the generation of the
M! signals for the first row is simpler, and theoretically allows for the sa$ing of the delay of
one 3.3=B gate. In addition, the implementation in ?ig. 0 has a delay that is smaller than the
two parts of ?ig. 1, although it could require a small amount of additional area.
.s we see in the following, this issue hardly has any significant impact on the o$erall
design, since this extra hardware is used only for the three most significant bits of the first row,
and not for all the other bits of the array.
The high-le$el description of our idea is as follows8
/. Aeneration of the three most significant bit weights of the first row, plus addition of the
last neg bit8
possible implementations can use a replication of three times the circuit of ?ig. 0 (each for the
three most significant bits of the first row), cascaded by the circuit of ?ig. M to add the neg
signal%
F. @arallel generation of the other bits of the first row8 possible implementations can use
instances of the circuitry depicted in ?ig. 1, for each bit of the first row, except for the three most
significant%
B. @arallel generation of the bits of the other rows8 possible implementations can use the circuitry
of ?ig. /, replicated for each bit of the other rows.
.ll items / to B are independent, and therefore can be executed in parallel. 2learly if, as
assumed and expected, item / is not the bottlenec* (i.e., the critical path), then the
implementation of the proposed idea has reached the goal of not introducing time penalties.
:6- E1t$n)on to R$ct!n&u%!r Mu%t0%$r):
. number of potential extensions to the proposed method exist, including rectangular
multipliers, higher radix M!, and multipliers with fused accumulation R//S. &ere, we quic*ly
focus on m G n rectangular multipliers. #ith no loss of generality, we assume m H+ n i.e., m + n
5 m
N
with mNH+ :, since it leads to a smaller number of rows% for simplicity, and also with no loss
of generality, in the following, we assume that both m and n are e$en. 3ow, we ha$e seen in ?ig.
Ea, that for mN + : then the last neg bit, i.e., neg Rn4FS5/ belongs to the same column as the first row
partial product . #e obser$e that the first partial product row has bits up to %
therefore, in order to also include in the first row the contribution of , due to the
particular nature of operands it is necessary to perform a carry propagation (i.e.,
bit addition) in the sum
Thus, for rectangular multipliers, the proposed
approach can be applied #ith the cost of a -bit addition.
The complete or e$en partial execution o$erlap of the first row with other rows
generation clearly depends on a number of factors, including the $alue of mN and the way that the
-bit addition is implemented, but still the proposed approach offers an interesting
alternati$e that can possibly be explored for designing and implementing rectangular multipliers.
,6 E#ALUATION AND COMPARISONS:
In this section, the proposed method based on the addition of the last neg signal to the first
row is first e$aluated. The designed architecture is then compared with an implementation based
on the computation of the twoNs complement of the last row (referred to as ZTwoNs complement[
method) using the designs for the B-C decoders, 4-/ multiplexers, and twoNs complement tree in
R0S. Moreo$er, in the analysis, the standard M! implementations for the first and for a Aeneric
partial product row are also ta*en into account (as summari"ed in Table F).
?or all the implementations, we explicitly e$aluate the most common case of a n x n
multiplier, although we ha$e shown in -ection 4 that the proposed approach can also be extended
to m x n rectangular multipliers. #hile studying the framewor* of possible implementations, we
considered the first phase of the multiplication algorithm (i.e., the partial product generation) and
we focused our attention on the issues of area occupancy and modular design, since it is
reasonable to expect that they lead to a possibly small multiplier with regular layout. The
detailed results of some extensi$e e$aluations and comparisons, both based on theoretical
analysis and related implementations are reported in R//S. ;esults encompass the following8
/. Theoretical analysis based on the concept of equi$alent gates from AaKs*iNs analysis RF/S (as
in R0S),
F. Theoretical analysis based on delay and area costs for elementary gates in a standard cell
library,
B. Theoretical analysis showing that the proposed approach, in the $ersion minimi"ing area, can
$ery li*ely o$erlap the generation of the first row with the generation of the other rows, and 4.
'alidation by logic synthesis and technology mapping to an industrial cell library. .ll the results
show the feasibility of the proposed approach. &ere, for the sa*e of simplicity, we quic*ly
summari"e the results of the theoretical analysis and we chec* the $alidity of our estimations
through logic synthesis and simulation.
,6/ H&3-L$'$% R$2!r<) !nd T3$or$tc!% An!%*)):
.s can be seen from ?ig. E, the generation of the first row is different from the generation of the
other rows, basically for two reasons8
/. The first row needs to assimilate the last neg signal, an operation which requires an
addition o$er the three most significant bit weights%

F.. The first row can ta*e ad$antage of a simpler ooth recoding, as the yW/ bit used by the
M! is always equal to "ero (-ection 4).
.s seen before, in ?ig. 1, we ha$e a possible implementation to generate the first row, which
ta*es into account the simpler generation of the M! signals. #e ha$e seen that by combining
the two parts of ?ig. 1 we get ?ig. 0, which is faster than ?ig. 1, at a possibly slightly larger area
cost certainly $ery marginal with respect to the global area of all the partial product bits coming
from the other rows. #e ha$e done some rough simulations and found that a good trade-off
could be to ha$e the generation of the first bits of the first row carried out by the circuit of ?ig. 0,
followed by the cascaded addition pro$ided by ?ig. M (-ection 4).
ased on all of the abo$e, our architecture has been designed to perform the following
operations8
/. Aeneration of the three most significant bit weights of the first row (through the $ery
small and regular circuitry of ?ig. 0) and addition to these bits of the neg signal (by means of the
circuitry of ?ig. M)%
F. Aeneration of the other bits of the first row, using the circuitry depicted in ?ig. 1% and
B. Aeneration of the bits of the other rows, using the circuitry of ?ig. /.
.s these three operations can be carried out in parallel, the o$erall critical path of the
proposed architecture emerges from the largest delay among the abo$e paths.
2ritical path and area cost for the proposed architecture, as well as for the other
implementations in Table F, were computed with reference to a /B: nm &2M>- standard cell
library from -TMicroelectronics RFFS (later used also for obtaining o$erall synthesis results). In
this analysis, the contribution of wires was neglected, and a buffer-free configuration was
considered. 3onetheless, details regarding buffer stages location and si"e are discussed in R//S.
=ata concerning area and delay for elementary cells used in this wor* (as well as in R0S) are
reported in Table B. ;esults are reported in Tables 4 and C, respecti$ely. It is worth obser$ing that
results may $ary depending on specific parameters selected for the synthesis such as logic
implementation, optimi"ation strategies, and target libraries.
#e obser$e that the ZTwoNs 2omplement[ approach has a delay that is longer than the
delay to generate the standard partial product rows, becoming e$en longer as the si"e n of the
multiplier increases (e.g., exceeding the delay of a D3>;F gate starting from n \ /E). >n the
other hand, according to theoretical estimations, we can see that the delay for generating the first
row in the proposed method is
es
timated to be lower than the delay for generating the standard rows. This means that the extra
row is eliminated without any penalty on the o$erall critical path.
#ith respect to area costs, it can be obser$ed that the proposed method hardly introduces
any area o$erhead with respect to the standard generation of a partial product row. >n the other
hand, the ZTwoNs 2omplement[ approach requires additional hardware, which increases with the
si"e of the multiplier.
,6- I20%$2$nt!ton R$)u%t):
In order to further chec* the $alidity of our estimations in an implementation technology,
we implemented the designs in Table F through logic synthesis and technology mapping to an
industrial standard cell library. -pecifically, for the logic synthesis, we used -ynopsys =esign
2ompiler and the designs were mapped to a /B: nm &2M>- industrial library from
-TMicroelectronics RFFS.
To perform the e$aluation, we obtained the area-delay space for the sole generation of
the partial product row of interest (i.e., the first row in the proposed approach, the last row in the
implementation presented in R0S). In order to support the comparison, the area-delay space for the
generation of the partial product rows using standard M! implementations was also e$aluated,
by considering the first row and the other rows of the partial product array separately (Table F).
The results, obtained for n + 1, /E, and BF, are depicted in ?ig. /:.
The delays are shown both in absolute units (ns) and normali"ed to the delay of an
in$erter with a fan-out of four (E1 ps for the technology used, under worst-case conditions).
.ccordingly, the area is presented both in absolute units (]mF) and normali"ed to equi$alent
gates using the area of a 3.3=F gate (48B0 ]mF for the technology used). #e obtained se$eral
design points (using different target delays) for each approach, and the minimum delay shown
corresponds to the fastest design that the tool was capable of synthesi"ing.
#e obser$e that the Z@roposed method[ implementation produces a cur$e in the delay-
area graph bounded by the cur$e for the generation of a standard partial product (upper bound)
and by the cur$e for the standard generation of the first partial product (lower bound) for the
three $alues of n considered. Moreo$er, the minimum delay that is achie$ed is $ery similar to the
case of the generation of a standard partial product for n+ 1% /E (with our approach it is about
:.C-:.M ?>4 higher), and is e$en less for n+BF due to the predominant effect of the higher
loading of the control signals. Therefore, our scheme does not introduce any additional delay in
the partial product generation stage for target delays higher than about C ?>4.
The cur$e for our scheme gets closer to the cur$e corresponding to the standard
generation of the first partial product as n increases. This is due to the fact that as n increases, the
short addition of the leading part achie$es more o$erlap with the generation of the rest of the
partial product (with higher input load capacitance, as n increases).
The ZTwoNs 2omplement[ scheme achie$es minimum delays between M and /: ?>4, at
the cost of requiring more than four times the area at this point, compared to the Z@roposed
method[ approach. Most importantly, its delay is much higher than the one of any standard row.
=6 CONCLUSIONS:
TwoNs complement n x n multipliers using radix-4 Modified ooth !ncoding produce Rn4FS
partial products but due to theF sign handling, the partial product array has a maximum height of
Rn4FS 5 /. #e presented a scheme that produces a partial product array with a maximum height of
Rn4FS, withoutF introducing any extra delay in the partial product generation stage. #ith the extra
hardware of a (short) B-bit addition, and the simpler generation of the first partial product row,
we ha$e been able to achie$e a delay for the proposed scheme within the bound of the delay of a
standard partial product row generation. The outcome of the abo$e is that the reduction of the
maximum height of the partial product array by one unit may simplify the partial product
reduction tree, both in terms of delay and regularity of the layout. This is of special interest for
all multipliers, and especially for single-cycle short bit-width multipliers for high performance
embedded cores, where short bit-width multiplications are common operations. #e ha$e also
compared our approach with a recent proposal with the same aim, considering results using a
widely used industrial synthesis tool and a modern industrial technology library, and concluded
that our approach may impro$e both the performance and area requirements of square multiplier
designs. The proposed approach also applies with minor modifications to rectangular and to
general radix- Modified ooth !ncoding multipliers.
>6 R$4$r$nc$):
1. M.=. !rcego$ac and T. <ang, =igital .rithmetic. Morgan Yaufmann @ublishers, F::B.
2. -.Y. &su, -.Y. Mathew, M... .nders, .;. ^eydel, '.A.>*lobd"iKa, ;.Y.
Yrishnamurthy, and -.V. or*ar, Z. //:A>@-4 # /E-it Multiplier and ;econfigurable
@<. <oop in 0:-nm 2M>-,[ I!!! Q. -olid -tate 2ircuits, $ol. 4/, no. /, pp. FCE-FE4,
Qan.F::E.
3. &. Yaul, M... .nders, -.Y. Mathew, -.Y. &su, .. .garwal, ;.Y.Yrishnamurthy, and -.
or*ar, Z. B:: m' 404A>@-4# ;econfi-gurable =ual--upply 4-#ay -IM= 'ector
@rocessing .ccelerator in 4C nm 2M>-,[ I!!! Q. -olid -tate 2ircuits, $ol. 4C, no. /, pp.
0C-/:/, Qan. F:/:.
4. M.-. -chmoo*ler, M. @utrino, .. Mather, Q. Tyler, &.'. 3guyen, 2.;oth, M. -harma,
M.3. @ham, and Q. <ent, Z. <ow-@ower, &igh--peed Implementation of a @ower@2
Microprocessor 'ector !xtension,[ @roc. /4th I!!! -ymp. 2omputer .rithmetic, pp. /F-
/0,/000.
5. >.<. Mac-orley, Z&igh -peed .rithmetic in inary 2omputers,[@roc. I;!, $ol. 40, pp.
EM-0/, Qan. /0E/.
6. <. =adda, Z-ome -chemes for @arallel Multipliers,[ .lta ?requen"a,$ol. B4, pp. B40-BCE,
May /0EC.
7. 2.-. #allace, Z. -uggestion for a ?ast Multiplier,[ I!!! Trans.!lectronic 2omputers,
$ol. !2-/B, no. /, pp. /4-/M, ?eb. /0E4.=.!. -haw, Z.nton8 . -peciali"ed Machine for
Millisecond--caleMolecular =ynamics -imulations of @roteins,[ @roc. /0th I!!! -ymp.
2omputer .rithmetic, p. B, F::0.
8. Q.-V. Yang and Q.-<. Aaudiot, Z. -imple &igh--peed Multiplier =esign,[ I!!!
Trans.2omputers, $ol. CC, no. /:, pp. /FCB-/FC1, >ct.F::E.
9. Q.-V. Yang and Q.-<. Aaudiot, Z. ?ast and #ell--tructured Multiplier,[ @roc. !uromicro
-ymp. =igital -ystem =esign, pp. C:1-C/C, -ept. F::4.
10. ?. <amberti, 3. .ndri*os, !. .ntelo, and @. Montuschi,Z-peeding-Ip ooth !ncoded
Multipliers by ;educing the -i"e of @artial @roduct .rray,[ internal report,
http844arith.polito.it4ir_mbe.pdf, pp. /-/4, F::0.
11. !.M. -chwar", ;.M. .$erill III, and <.Q. -igal, Z. ;adix-1 2M>- -4B0: Multiplier,[
@roc. /Bth I!!! -ymp. 2omputer .rithmetic, pp. F-0, /00M.
12. #.-2. Veh and 2.-#. Qen, Z&igh--peed ooth !ncoded @arallel Multiplier =esign,[ I!!!
Trans. 2omputers, $ol. 40, no. M, pp. E0F-M:/, Quly F:::.
13. ^. &uang and M.=. !rcego$ac, Z&igh-@erformance <ow-@ower <eft-to-;ight .rray
Multiplier =esign,[ I!!! Trans. 2omputers,$ol. C4, no. B, pp. FMF-F1B, Mar. F::C.
14. ;. ^immermann and =.`. Tran, Z>ptimi"ed -ynthesis of -um-of-@roducts,[ @roc. 2onf.
;ecord of the BMth .silomar 2onf. -ignals,-ystems and 2omputers, $ol. /, pp. 1EM-1MF,
F::B.
15. '.A. >*lobd"iKa, =. 'illeger, and -.-. <iu, Z. Method for -peed >ptimi"ed @artial
@roduct ;eduction and Aeneration of ?ast @arallel Multipliers Ising an .lgorithmic
.pproach,[ I!!! Trans.2omputers, $ol. 4C, no. B, pp. F04-B:E, Mar. /00E.
16. @.?. -telling, 2.I. Martel, '.A. >*lobd"iKa, and ;. ;a$i, Z>ptimal 2ircuits for @arallel
Multipliers,[ I!!! Trans. 2omputers, $ol. 4M,no. B, pp. FMB-F1C, Mar. /001.
17. Q.-V. Yang and Q.-<. Aaudiot, Z. <ogarithmic Time Method for TwoNs
2omplementation,[ @roc. IntNl 2onf. 2omputational -cience, pp. F/F-F/0, F::C.
18. Y. &wang, 2omputer .rithmetic @rinciples, .rchitectures, and =esign.#iley, /0M0.
19. ;. &ashemian and 2.@. 2hen, Z. 3ew @arallel Technique for =esign of
=ecrement4Increment and TwoNs 2omplement 2ircuits,[ @roc. B4th Midwest -ymp.
2ircuits and -ystems, $ol. F,pp. 11M-10:, /00/.
20. =. AaKs*i, @rinciples of =igital =esign. @rentice-&all, /00M.-TMicroelectronics, Z/B:nm
&2M>-0 2ell <ibrary,[
http844www.st.com4stonline4products4technologies4soc4e$ol.htm,F:/:.
S*nt!1 r$0ort
Started : "Check Syntax for Partia!"rod#ct".
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
% &'( Co)"iation %
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
Co)"iin* +erio* ,e ",*8-.+" in i-rary .ork
Co)"iin* +erio* ,e ",*8a.+" in i-rary .ork
/od#e 0,*8-1 co)"ied
/od#e 0,*8a1 co)"ied
Co)"iin* +erio* ,e ",*1.+" in i-rary .ork
/od#e 0Partia!"rod#ct1 co)"ied
2o error3 in co)"iation
4nay3i3 of ,e 0"Partia!"rod#ct."r5"1 3#cceeded.

Proce33 "Check Syntax" co)"eted 3#cce33f#y
S*nt3$)) r$0ort
6eea3e 9.2i 7 x3t 8.36
Co"yri*ht 9c: 199572007 ;iinx< =nc. 4 ri*ht3 re3er+ed.
771 Para)eter >/P'=6 3et to .?x3t?"ro5na+.t)"
CP@ : 0.00 ? 0.13 3 A Ba"3ed : 0.00 ? 0.00 3

771 Para)eter x3thd"dir 3et to .?x3t
CP@ : 0.00 ? 0.13 3 A Ba"3ed : 0.00 ? 0.00 3

771 6eadin* de3i*n: Partia!"rod#ct."r5
>4C(B DE CD2>B2>S
1: Synthe3i3 D"tion3 S#))ary
2: &'( Co)"iation
3: 'e3i*n &ierarchy 4nay3i3
4: &'( 4nay3i3
5: &'( Synthe3i3
5.1: &'( Synthe3i3 6e"ort
6: 4d+anced &'( Synthe3i3
6.1: 4d+anced &'( Synthe3i3 6e"ort
7: (o. (e+e Synthe3i3
8: Partition 6e"ort
9: Eina 6e"ort
9.1: 'e+ice #tiiFation 3#))ary
9.2: Partition 6e3o#rce S#))ary
9.3: >=/=2G 6BPD6>
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
% Synthe3i3 D"tion3 S#))ary %
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
7777 So#rce Para)eter3
=n"#t Eie 2a)e : "Partia!"rod#ct."r5"
=n"#t Eor)at : )ixed
=*nore Synthe3i3 Con3traint Eie : 2D
7777 >ar*et Para)eter3
D#t"#t Eie 2a)e : "Partia!"rod#ct"
D#t"#t Eor)at : 2GC
>ar*et 'e+ice : xc33500e757c"132
7777 So#rce D"tion3
>o" /od#e 2a)e : Partia!"rod#ct
4#to)atic ES/ Bxtraction : HBS
ES/ Bncodin* 4*orith) : 4#to
Safe =)"e)entation : 2o
ES/ Stye : #t
64/ Bxtraction : He3
64/ Stye : 4#to
6D/ Bxtraction : He3
/#x Stye : 4#to
'ecoder Bxtraction : HBS
Priority Bncoder Bxtraction : HBS
Shift 6e*i3ter Bxtraction : HBS
(o*ica Shifter Bxtraction : HBS
;D6 Coa"3in* : HBS
6D/ Stye : 4#to
/#x Bxtraction : HBS
6e3o#rce Sharin* : HBS
43ynchrono#3 >o Synchrono#3 : 2D
/#ti"ier Stye : a#to
4#to)atic 6e*i3ter Caancin* : 2o
7777 >ar*et D"tion3
4dd =D C#Ier3 : HBS
Go-a /axi)#) Eano#t : 500
4dd Generic Cock C#Ier9C@EG: : 24
6e*i3ter '#"ication : HBS
Sice Packin* : HBS
D"ti)iFe =n3tantiated Pri)iti+e3 : 2D
@3e Cock Bna-e : He3
@3e Synchrono#3 Set : He3
@3e Synchrono#3 6e3et : He3
Pack =D 6e*i3ter3 into =DC3 : a#to
BJ#i+aent re*i3ter 6e)o+a : HBS
7777 Genera D"tion3
D"ti)iFation Goa : S"eed
D"ti)iFation BIort : 1
(i-rary Search Drder : Partia!"rod#ct.3o
Kee" &ierarchy : 2D
6>( D#t"#t : He3
Go-a D"ti)iFation : 4Cock2et3
6ead Core3 : HBS
Lrite >i)in* Con3traint3 : 2D
Cro33 Cock 4nay3i3 : 2D
&ierarchy Se"arator : ?
C#3 'ei)iter : 01
Ca3e S"eci,er : )aintain
Sice @tiiFation 6atio : 100
C64/ @tiiFation 6atio : 100
Merio* 2001 : HBS
4#to C64/ Packin* : 2D
Sice @tiiFation 6atio 'eta : 5
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
% &'( Co)"iation %
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
Co)"iin* +erio* ,e ",*1.+" in i-rary .ork
/od#e 0Partia!"rod#ct1 co)"ied
2o error3 in co)"iation
4nay3i3 of ,e 0"Partia!"rod#ct."r5"1 3#cceeded.

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
% 'e3i*n &ierarchy 4nay3i3 %
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
4nayFin* hierarchy for )od#e 0Partia!"rod#ct1 in i-rary 0.ork1.
4nayFin* hierarchy for )od#e 0,*8a1 in i-rary 0.ork1.
4nayFin* hierarchy for )od#e 0,*8-1 in i-rary 0.ork1.
4nayFin* hierarchy for )od#e 0,*1a1 in i-rary 0.ork1.
4nayFin* hierarchy for )od#e 0,*1-1 in i-rary 0.ork1.
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
% &'( 4nay3i3 %
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
4nayFin* to" )od#e 0Partia!"rod#ct1.
/od#e 0Partia!"rod#ct1 i3 correct for 3ynthe3i3.

4nayFin* )od#e 0,*8a1 in i-rary 0.ork1.
/od#e 0,*8a1 i3 correct for 3ynthe3i3.

4nayFin* )od#e 0,*8-1 in i-rary 0.ork1.
/od#e 0,*8-1 i3 correct for 3ynthe3i3.

4nayFin* )od#e 0,*1a1 in i-rary 0.ork1.
/od#e 0,*1a1 i3 correct for 3ynthe3i3.

4nayFin* )od#e 0,*1-1 in i-rary 0.ork1.
/od#e 0,*1-1 i3 correct for 3ynthe3i3.

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
% &'( Synthe3i3 %
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
Perfor)in* -idirectiona "ort re3o#tion...
Synthe3iFin* @nit 0,*8a1.
6eated 3o#rce ,e i3 ",*8a.+".
@nit 0,*8a1 3ynthe3iFed.
Synthe3iFin* @nit 0,*8-1.
6eated 3o#rce ,e i3 ",*8-.+".
Eo#nd 17-it xor2 for 3i*na 0""05!0Nxor00001.
@nit 0,*8-1 3ynthe3iFed.
Synthe3iFin* @nit 0,*1a1.
6eated 3o#rce ,e i3 ",*1a.+".
Eo#nd 17-it xor2 for 3i*na 0onei0011.
@nit 0,*1a1 3ynthe3iFed.
Synthe3iFin* @nit 0,*1-1.
6eated 3o#rce ,e i3 ",*1-.+".
Eo#nd 17-it xor2 for 3i*na 0""i5!0Nxor00001.
@nit 0,*1-1 3ynthe3iFed.
Synthe3iFin* @nit 0Partia!"rod#ct1.
6eated 3o#rce ,e i3 ",*1.+".
L462=2G:;3t:1306 7 D#t"#t 0JJ90-ar1 i3 ne+er a33i*ned.
L462=2G:;3t:1306 7 D#t"#t 0JJ601 i3 ne+er a33i*ned.
L462=2G:;3t:646 7 Si*na 0""011 i3 a33i*ned -#t ne+er #3ed.
L462=2G:;3t:1780 7 Si*na 0""801 i3 ne+er #3ed or a33i*ned.
@nit 0Partia!"rod#ct1 3ynthe3iFed.
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
&'( Synthe3i3 6e"ort
/acro Stati3tic3
O ;or3 : 35
17-it xor2 : 35
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
% 4d+anced &'( Synthe3i3 %
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
(oadin* de+ice for a""ication 6f!'e+ice fro) ,e P33500e.n"hP in en+iron)ent
C:Q;iinx92i.
L462=2G:;3t:1290 7 &ierarchica -ock 0ro.1""601 i3 #nconnected in -ock
0Partia!"rod#ct1.
=t .i -e re)o+ed fro) the de3i*n.
L462=2G:;3t:1290 7 &ierarchica -ock 0ro.1""701 i3 #nconnected in -ock
0Partia!"rod#ct1.
L462=2G:;3t:1290 7 &ierarchica -ock 0ro.2"0111 i3 #nconnected in -ock
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
0Partia!"rod#ct1.
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
4d+anced &'( Synthe3i3 6e"ort
/acro Stati3tic3
O ;or3 : 35
17-it xor2 : 35
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
% (o. (e+e Synthe3i3 %
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
D"ti)iFin* #nit 0Partia!"rod#ct1 ...
/a""in* a eJ#ation3...
C#idin* and o"ti)iFin* ,na neti3t ...
Eo#nd area con3traint ratio of 100 9R 5: on -ock Partia!"rod#ct< act#a ratio i3 0.
Eina /acro Proce33in* ...
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
Eina 6e*i3ter 6e"ort
Eo#nd no )acro
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
% Partition 6e"ort %
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
Partition =)"e)entation Stat#3
7777777777777777777777777777777
2o Partition3 .ere fo#nd in thi3 de3i*n.
7777777777777777777777777777777
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
% Eina 6e"ort %
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
Eina 6e3#t3
6>( >o" (e+e D#t"#t Eie 2a)e : Partia!"rod#ct.n*r
>o" (e+e D#t"#t Eie 2a)e : Partia!"rod#ct
D#t"#t Eor)at : 2GC
D"ti)iFation Goa : S"eed
Kee" &ierarchy : 2D
'e3i*n Stati3tic3
O =D3 : 82
Ce @3a*e :
O CB(S : 6
O (@>3 : 1
O (@>4 : 5
O =D C#Ier3 : 44
O =C@E : 8
O DC@E : 36
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
'e+ice #tiiFation 3#))ary:
777777777777777777777777777
Seected 'e+ice : 33500ec"13275
2#)-er of Sice3: 3 o#t of 4656 0S
2#)-er of 4 in"#t (@>3: 6 o#t of 9312 0S
2#)-er of =D3: 82
2#)-er of -onded =DC3: 44 o#t of 92 47S
777777777777777777777777777
Partition 6e3o#rce S#))ary:
777777777777777777777777777
2o Partition3 .ere fo#nd in thi3 de3i*n.
777777777777777777777777777
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
>=/=2G 6BPD6>
2D>B: >&BSB >=/=2G 2@/CB6S 46B D2(H 4 SH2>&BS=S BS>=/4>B.
ED6 4CC@64>B >=/=2G =2ED6/4>=D2 P(B4SB 6BEB6 >D >&B >64CB 6BPD6>
GB2B64>B' 4E>B6 P(4CB7and76D@>B.
Cock =nfor)ation:
777777777777777777
2o cock 3i*na3 fo#nd in thi3 de3i*n
43ynchrono#3 Contro Si*na3 =nfor)ation:
7777777777777777777777777777777777777777
2o a3ynchrono#3 contro 3i*na3 fo#nd in thi3 de3i*n
>i)in* S#))ary:
777777777777777
S"eed Grade: 75
/ini)#) "eriod: 2o "ath fo#nd
/ini)#) in"#t arri+a ti)e -efore cock: 2o "ath fo#nd
/axi)#) o#t"#t reJ#ired ti)e after cock: 2o "ath fo#nd
/axi)#) co)-inationa "ath deay: 6.176n3
>i)in* 'etai:
77777777777777
4 +a#e3 di3"ayed in nano3econd3 9n3:
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
>i)in* con3traint: 'efa#t "ath anay3i3
>ota n#)-er of "ath3 ? de3tination "ort3: 138 ? 36
7777777777777777777777777777777777777777777777777777777777777777777777777
'eay: 6.176n3 9(e+e3 of (o*ic $ 3:
So#rce: /r011 9P4':
'e3tination: ""00051 9P4':
'ata Path: /r011 to ""00051
Gate 2et
Ce:in71o#t fano#t 'eay 'eay (o*ica 2a)e 92et 2a)e:
7777777777777777777777777777777777777777 777777777777
=C@E:=71D 6 1.106 0.721 /r!1!=C@E 9/r!1!=C@E:
(@>3:=071D 6 0.612 0.569 ro.1""00?""05!0!not00001 9""00!5!DC@E:
DC@E:=71D 3.169 ""00!5!DC@E 9""00051:
7777777777777777777777777777777777777777
>ota 6.176n3 94.887n3 o*ic< 1.289n3 ro#te:
979.1S o*ic< 20.9S ro#te:
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$
CP@ : 3.31 ? 3.45 3 A Ba"3ed : 3.00 ? 3.00 3

771
>ota )e)ory #3a*e i3 147804 kio-yte3
2#)-er of error3 : 0 9 0 ,tered:
2#)-er of .arnin*3 : 71 9 0 ,tered:
2#)-er of info3 : 0 9 0 ,tered:
T$)t "$nc3 codn&
????????????????????????????????????????????????????????????????????????????????
?? Co"yri*ht 9c: 199572007 ;iinx< =nc.
?? 4 6i*ht 6e3er+ed.
????????????????????????????????????????????????????????????????????????????????
?? !!!! !!!!
?? ? ?Q? ?
?? ?!!!? Q ? Mendor: ;iinx
?? Q Q Q? Mer3ion : 9.2i
?? Q Q 4""ication : =SB
?? ? ? Eiena)e : t3!t-!3efcheck.tf.
?? ?!!!? ?Q >i)e3ta)" : /on 8an 23 18:06:08 2012
?? Q Q ? Q
?? Q!!!Q?Q!!!Q
??
??Co))and:
??'e3i*n 2a)e: t3!t-!3efcheck!-eh
??'e+ice: ;iinx
??
Tti)e3cae 1n3?1"3
)od#e t3!t-!3efcheck!-ehU
re* V7:0W /d $ 8P-00000000U
re* V7:0W /r $ 8P-00000000U
.ire V15:0W k)U
.ire V15:0W k1U
.ire V15:0W k2U
.ire V15:0W k3U
.ire V15:0W k4U
te3t @@> 9
./d9/d:<
./r9/r:<
.k)9k):<
.k19k1:<
.k29k2:<
.k39k3:<
.k49k4::U
inte*er >;!B66D6 $ 0U

initia -e*in ?? D"en the re3#t3 ,e...
O1000 ?? Eina ti)e: 1000 n3
if 9>;!B66D6 $$ 0: -e*in
Ndi3"ay9"2o error3 or .arnin*3.":U
end e3e -e*in
Ndi3"ay9"Sd error3 fo#nd in 3i)#ation."< >;!B66D6:U
end
N3to"U
end
initia -e*in
?? 7777777777777 C#rrent >i)e: 200n3
O200U
/r $ 8P-00001100U
?? 7777777777777777777777777777777777777
?? 7777777777777 C#rrent >i)e: 250n3
O50U
C&BCK!k2916P-0000101111111100:U
C&BCK!k3916P-0011000000000100:U
?? 7777777777777777777777777777777777777
?? 7777777777777 C#rrent >i)e: 300n3
O50U
/d $ 8P-00110111U
?? 7777777777777777777777777777777777777
?? 7777777777777 C#rrent >i)e: 350n3
O50U
C&BCK!k)916P-0000001010010100:U
C&BCK!k2916P-0000101100100000:U
C&BCK!k3916P-0011001101110100:U
?? 7777777777777777777777777777777777777
?? 7777777777777 C#rrent >i)e: 600n3
O250U
/d $ 8P-01111101U
/r $ 8P-10101101U
?? 7777777777777777777777777777777777777
?? 7777777777777 C#rrent >i)e: 650n3
O50U
C&BCK!k)916P-1101011101111001:U
C&BCK!k1916P-0000010010111101:U
C&BCK!k2916P-0000101000001000:U
C&BCK!k3916P-0010100000100100:U
C&BCK!k4916P-1010000010010000:U
?? 7777777777777777777777777777777777777
?? 7777777777777 C#rrent >i)e: 800n3
O150U
/d $ 8P-10011001U
/r $ 8P-10011001U
?? 7777777777777777777777777777777777777
?? 7777777777777 C#rrent >i)e: 850n3
O50U
C&BCK!k)916P-0010100101110001:U
C&BCK!k1916P-0000001111011001:U
C&BCK!k2916P-0000111100110100:U
C&BCK!k3916P-0010001100100100:U
C&BCK!k4916P-1111001101000000:U
end
ta3k C&BCK!k)U
in"#t V15:0W 2B;>!k)U
O0 -e*in
if 92B;>!k) X$$ k): -e*in
Ndi3"ay9"Brror at ti)e$Sdn3 k)$S-< ex"ected$S-"< Nti)e< k)<
2B;>!k):U
>;!B66D6 $ >;!B66D6 R 1U
end
end
endta3k
ta3k C&BCK!k1U
in"#t V15:0W 2B;>!k1U
O0 -e*in
if 92B;>!k1 X$$ k1: -e*in
Ndi3"ay9"Brror at ti)e$Sdn3 k1$S-< ex"ected$S-"< Nti)e< k1<
2B;>!k1:U
>;!B66D6 $ >;!B66D6 R 1U
end
end
endta3k
ta3k C&BCK!k2U
in"#t V15:0W 2B;>!k2U
O0 -e*in
if 92B;>!k2 X$$ k2: -e*in
2B;>!k2:U
>;!B66D6 $ >;!B66D6 R 1U
end
end
endta3k
ta3k C&BCK!k3U
in"#t V15:0W 2B;>!k3U
O0 -e*in
if 92B;>!k3 X$$ k3: -e*in
2B;>!k3:U
>;!B66D6 $ >;!B66D6 R 1U
end
end
endta3k
ta3k C&BCK!k4U
in"#t V15:0W 2B;>!k4U
O0 -e*in
if 92B;>!k4 X$$ k4: -e*in
2B;>!k4:U
>;!B66D6 $ >;!B66D6 R 1U
end
end
endta3k
end)od#e
o#t "#t .a+e for):
Sche)atic dia*ra):
>echnica 3che)atic dia*ra):
.

Reducing Computation Time For Short Bit Width Twos Compliment Multiplier

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reducing Computation Time For Short Bit Width Twos Compliment Multiplier

Uploaded by

Copyright:

Available Formats

REDUCING THE COMPUTATION TIME IN (SHORT BIT-

WIDTH) TWO'S COMPLEMENT MULTIPLIERS

You might also like