You are on page 1of 104

DEPT OF ECE GIST

CHAPTER 1
INTRODUCTION
1.1 DECODE AND COMPARE ARCHITECTURE

The conventional decode-and-compare architecture and the encode-and- compare


architecture based on the direct compare method. For the sake of concreteness, only the tag
matching performed in a cache memory is discussed in this brief, but the proposed
architecture can be applied to similar applications without loss of generality.

Figure.1.1. (a) Decode-and-compare architecture and (b) encode-and-


compare architecture

Let us consider a cache memory where a k-bit tag is stored in the form of an n-bit
codeword after being encoded by a (n, k) code. In the decode-and-compare architecture depicted
in Fig.1.1 (a), the n-bit retrieved codeword should first be decoded to extract the original k-bit
tag. The extracted k-bit tag is then compared with the k-bit tag field of an incoming address to
determine whether the tags are matched or not. As the retrieved codeword should go through the
decoder before being compared with the incoming tag, the critical path is too long to be
employed in a practical cache system designed for high- speed access. Since the decoder is one
of the most complicated processing elements, in addition, the complexity overhead is not
negligible.
Note that decoding is usually more complex and takes more time than encoding as it
encompasses a series of error detection or syndrome calculation, and error correction. The
implementation results support the claim. To resolve the drawbacks of the decode-and-
compare architecture, therefore, the decoding of a retrieved codeword is replaced with the

Page 1
DEPT OF ECE GIST
encoding of an incoming tag in the encode-

Page 1
DEPT OF ECE GIST

and-compare architecture More precisely, a k-bit incoming tag is first encoded to the
corresponding n-bit codeword X and compared with an n-bit retrieved codeword Y as shown
in Fig1.1(b). The comparison is to examine how many bits the two code words differ, not to
check if the two code words are exactly equal to each other. For this, we compute the
Hamming distance d between the two code words and classify the cases according to the
range of d. Let tmax and rmax denote the numbers of maximally correctable and detectable
errors, respectively. The cases are summarized as follows.

i. If d = 0, X matches Y exactly.

ii. If 0 < d ≤ tmax, X will match Y provided at most tmax errors in Y are corrected.

iii. If tmax< d ≤ rmax, Y has detectable but uncorrectable errors. In this case, the cache
may issue a system fault so as to make the central processing unit take a proper action.

iv. If rmax< d, X does not match Y .

Fig. 1.2 SA-based architecture supporting the direct compare method

Assuming that the incoming address has no errors, we can regard the two tags
as matched if d is in either the first or the second ranges. In this way, while
maintaining the error-correcting capability, the architecture can remove the decoder
from its critical path at the cost of an encoder being newly introduced. Note that the
encoder is, in general, much simpler than the decoder, and thus the encoding cost is
significantly less than the decoding cost. Since the above method needs to compute
the Hamming distance, presented a circuit dedicated for the computation. The circuit
shown in Fig. 2 first performs XOR operations for every pair of bits in X and Y so as
to generate a vector representing the bitwise difference of the two code words. The
following half adders (HAs) are used to count the number of 1‘s in two adjacent bits
in the vector. The numbers of 1‘s are accumulated by passing through the following
SA tree. In the SA tree, the accumulated value z is saturated to rmax + 1 if it exceeds
rmax. More precisely, given inputs x and y, z can be expressed as follows:

Page 2
DEPT OF ECE GIST

The final accumulated value indicates the range of d. As the compulsory saturation
necessitates additional logic circuitry, the complexity of a SA is higher than the
conventional adder.

1.2 DECODER

In digital electronics, a decoder can take the form of a multiple-input,


multiple-output logic circuit that converts coded inputs into coded outputs, where the
input and output codes are different e.g. n-to-2n , binary-coded decimal decoders.
Decoding is necessary in applications such as data multiplexing, 7 segment display
and memory address decoding. The example decoder circuit would be an AND gate
because the output of an AND gate is "High" (1) only when all its inputs are "High."
Such output is called as "active High output". If instead of AND gate, the NAND gate
is connected the output will be "Low" (0) only when all its inputs are "High". Such
output is called as "active low output‖. A slightly more complex decoder would be the
n-to-2n type binary decoders. These types of decoders are combinational circuits that
convert binary information from 'n' coded inputs to a maximum of 2n unique outputs.
In case the 'n' bit coded information has unused bit combinations, the decoder may
have less than 2n outputs. 2-to-4 decoder, 3-to-8 decoder or 4-to-16 decoder are other
examples. The input to a decoder is parallel binary number and it is used to detect the
presence of a particular binary number at the input. The output indicates presence or
absence of specific number at the decoder input.

Let us suppose that a logic network has 2 inputs A and B. They will give rise
to 4 states A, A‘, B, B‘ . The truth table for this decoder is shown below:

Table 1.1: Truth Table of 2:4 decoder

Page 3
DEPT OF ECE GIST

Fig 1.2 Logic Diagram of 2:4 decoder

Fig 1.3: Representation of 2:4 decoder

For any input combination only one of the outputs is low and all others are
high. The low value at the output represents the state of the input.

Combine two or more small decoders with enable inputs to form a larger
decoder e.g. 3-to-8-line decoder constructed from two 2-to-4-line decoders. Decoder
with enable input can function as demultiplexer.

Page 4
DEPT OF ECE GIST

1.3 ENCODER

An encoder is a device, circuit, transducer, software program, algorithm or


person that converts information from one format or code to another. The purpose of
encoder is standardization, speed, secrecy, security, or saving space by shrinking size.
Encoders are combinational logic circuits and they are exactly opposite of decoders.
They accept one or more inputs and generate a multibit output code. Encoders
perform exactly reverse operation than decoder. An encoder has M input and N output
lines. Out of M input lines only one is activated at a time and produces equivalent
code on output N lines. If a device output code has fewer bits than the input code has,
the device is usually called an encoder.

1.3.1 Octal To Binary Encoder

Octal-to-Binary take 8 inputs and provides 3 outputs, thus doing the opposite
of what the 3-to-8 decoder does. At any one time, only one input line has a value of 1.
The figure below shows the truth table of an Octal-to-binary encoder.

Fig 1.4: Logic Diagram of octal to binary encoder

Page 5
DEPT OF ECE GIST

Table 1.2: Truth Table of octal to binary encoder

For an 8-to-3 binary encoder with inputs I0-I7 the logic expressions of the outputs Y0-
Y2 are:

Y0 = I1 + I3 + I5 + I7

Y1= I2 + I3 + I6 + I7

Y2 = I4 + I5 + I6 +I7

1.3.2 Priority Encoder

Fig 1.5: Logic Diagram of 4 bit priority encoder


A priority encoder is a circuit or algorithm that compresses multiple binary
inputs into a smaller number of outputs. The output of a priority encoder is the binary
representation of the ordinal number starting from zero of the most significant input
bit. They are often used to control interrupt requests by acting on the highest priority
request. It includes priority function. If 2 or more inputs are equal to 1 at the same
time, the input having the highest priority will take precedence. Internal hardware will
check this condition and priority is set.

Page 6
DEPT OF ECE GIST

Table 1.3: Truth Table of 4 bit priority encoder.

IC 74148 is an 8-input priority encoder. 74147 is 10:4 priority encoder

1.4 DIGITAL COMPARATOR


Another common and very useful combinational logic circuit is that of the
Digital Comparator circuit. Digital or Binary Comparators are made up from standard
AND, NORand NOT gates that compare the digital signals present at their input
terminals and produce an output depending upon the condition of those inputs.

For example, along with being able to add and subtract binary numbers we
need to be able to compare them and determine whether the value of input A is greater
than, smaller than or equal to the value at input B etc. The digital comparator
accomplishes this using several logic gates that operate on the principles of Boolean
Algebra. Thereare two main types of Digital Comparator available and these are.

1. Identity Comparator – an Identity Comparator is a digital comparator that has only


one output terminal for when A = B either ―HIGH‖ A = B = 1 or ―LOW‖ A = B = 0

2. Magnitude Comparator – a Magnitude Comparator is a digital comparator which


has three output terminals, one each for equality, A = B greater than, A > B and less
than A < B

The purpose of a Digital Comparator is to compare a set of variables or


unknown numbers, for example A (A1, A2, A3, …. An, etc) against that of a constant
or unknown value such as B (B1, B2, B3, ….Bn, etc) and produce an output condition
or flag depending upon the result of the comparison. For example, a magnitude
comparator of two 1-bits, (A and B) inputs would produce the following three output
conditions when compared to each other.

Page 7
DEPT OF ECE GIST

Which means: A is greater than B, A is equal to B, and A is less than B

This is useful if we want to compare two variables and want to produce an


output when any of the above three conditions are achieved. For example, produce an
output from a counter when a certain count number is reached. Consider the simple 1-
bit comparator below. You may notice two distinct features about the comparator
from the above truth table. Firstly, the circuit does not distinguish between either two
―0‖ or two ―1‖‗s as an output A = B is produced when they are both equal, either A = B
= ―0‖ or A = B = ―1‖. Secondly, the output condition for A = Resembles that of a
commonly available logic gate, the Exclusive⊕-NOR or Ex-NOR function
(equivalence) on each of the n-bits giving: Q = A B

Fig 1.6 1-bit Digital Comparator Circuit

Digital comparators actually use Exclusive-NOR gates within their design for
comparing their respective pairs of bits. When we are comparing two binary or BCD
values or variables against each other, we are comparing the ―magnitude‖ of these
values, logic ―0‖ against logic ―1‖.

Inputs Outputs

B A A>BA=BA<B

0 0 0 1 0

0 1 1 0 0

1 0 0 0 1

1 1 0 1 0

Table 1.4: Digital Comparator Truth Table


As well as comparing individual bits, we can design larger bit comparators by
cascading together nof these and produce a n-bit comparator just as we did for the n-

Page 8
DEPT OF ECE GIST

bit adder in the previous tutorial. Multi-bit comparators can be constructed to compare
whole binary or BCD words to produce an output if one word is larger, equal to or
less than the other.

A very good example of this is the 4-bit Magnitude Comparator. Here, two
4-bit words (―nibbles‖) are compared to each other to produce the relevant output with
one word connected to inputs Aand the other to be compared against connected to
input B as shown below.

1.4.1 8-Bit Word Comparator

Fig 1.7 8-bit Word Comparator

When comparing large binary or BCD numbers like the example above, to
save time the comparator starts by comparing the highest-order bit (MSB) first. If
equality exists, A = B then it compares the next lowest bit and so on until it reaches
the lowest-order bit, (LSB). If equality still exists then the two numbers are defined as
being equal. If inequality is found, either A > B or A < B the relationship between the
two numbers is determined and the comparison between any additional lower order
bits stops. Digital Comparator are used widely in Analogue-to-Digital converters,
(ADC) and Arithmetic Logic Units, (ALU) to perform a variety of arithmetic
operations

Page 9
DEPT OF ECE GIST

1.5 HAMMING DISTANCE

The Hamming distance between two strings of equal length is the number of
positions at which the corresponding symbols are different. In another way, it
measures the minimum number of substitutions required to change one string into the
other, or the minimum number of errors that could have transformed one string into
the other. A major application is in coding theory, more specifically to block codes, in
which the equal-length strings are vectors over a finite field.

For a fixed length n, the Hamming distance is a metric on the vector space of
the words of length n (also known as hamming), as it fulfills the conditions of non-
negativity, identity of in discernible and symmetry, and it can be shown by complete
induction that it satisfies the triangle inequality as well. The Hamming distance
between two words aandbcan also be seen as the Hamming weight of a−b for an
appropriate choice of the − operator.For binary strings a and b the Hamming distance
is equal to the number of one‘s (population count) in a XOR b. The metric space of
length-n binary strings, with the Hamming distance, is known as the Hamming cube;
it is equivalent as a metric space to the set of distances between vertices in a
hypercube graph. One can also view a binary string of length n as a vector in by
treating each symbol in the string as a real coordinate; with this embedding, the
strings form the vertices of an n-dimensional hypercube, and the Hamming distance of
the strings is equivalent to the Manhattan distance between the vertices.

Page 10
DEPT OF ECE GIST

CHAPTER 2
ERROR CORRECTION CODES
A new architecture that can reduce the latency and complexity of the data
comparison by using the characteristics of systematic codes. In addition, a new
processing element is presented to reduce the latency and complexity further.

2.1 DATAPATH DESIGN FOR SYSTEMATIC CODES

Fig. 2.1 Timing diagram of the tag match in (a) direct compare method
and (b) proposed architecture

Fig. 2.2 Systematic representation of an ECC codeword.

In the SA-based architecture, the comparison of two code words is invoked


after the incoming tag is encoded. Therefore, the critical path consists of a series of
the encoding and the n-bit comparison as shown in Fig.2.1 (a) However, did not
consider the fact that, in practice, the ECC codeword is of a systematic form in which
the data and parity parts are completely separated as shown in Fig.2.1 (b) As the data
part of a systematic codeword is exactly the same as the incoming tag field, it is
immediately available for comparison while the parity part becomes available only
after the encoding is completed. Grounded on this fact, the comparison of the k-bit
tags can be started before the remaining (n–k)-bit comparison of the parity bits. In the
proposed architecture, therefore, the encoding process to generate the parity bits from
the incoming tag is performed in parallel with the tag comparison, reducing the
overall latency

Page 11
DEPT OF ECE GIST

Fig. 2.3 Proposed architecture optimized for systematic codeword


The proposed architecture grounded on the data path design is shown in Fig. 5.
It contains multiple butterfly-formed weight accumulators (BWAs) proposed to
improve the latency and complexity of the Hamming distance computation. The basic
function of the BWA is to count the number of 1‘s among its input bits. It consists of
multiple stages of HAs as shown in Fig. 6(a), where each output bit of a HA is
associated with a weight. The HAs in a stage are connected in a butterfly form so as to
accumulate the carry bits and the sum bits of the upper stage separately. In other
words, both inputs of a HA in a stage, except the first stage, are either carry bits or
sum bits computed in the upper stage. This connection method leads to a property that
if an output bit of a HA is set, the number of 1‘s among the bits in the paths reaching
the HA is equal to the weight of the output bit. In Fig. 6(a), for example, if the carry
bit of the gray-colored HA is set, the number of 1‘s among the associated input bits,
i.e., A, B, C, and D, is 2. At the last stage of Fig. 6(a), the number of 1‘s among the
input bits, d, can be calculated as

d = 8I + 4 (J + K + M) + 2 (L + N + O) + P. (2)

Note that sum-bit lines are dotted for visibility. Since what we need is not the
precise Hamming distance but the range it belongs to, it is possible to simplify the
circuit. When rmax = 1, for example, two or more than two 1‘s among the input bits
can be regarded as the same case that falls in the fourth range. In that case, we can

Page 12
replace several HAs with a simple OR-gate tree as shown in Fig. 6(b). This is an
advantage over the SA that resorts to the compulsory saturation expressed in (1).

Note that in Fig. 6, there is no overlap between any pair of two carry-bit lines
or any pair of two sum-bit lines. As the overlaps exist only between carry-bit lines and
sum-bit lines, it is not hard to resolve overlaps in the contemporary technology that
provides multiple routing layers no matter how many bits a BWA takes.

Fig. 2.4.Proposed BWA. (a) General structure and (b) new structure revised
for the matching of ECC-protected data.

We now explain the overall architecture in more detail. Each XOR stage in
generates the bitwise difference vector for either data bits or parity bits, and the
following processing elements count the number of 1‘s in the vector, i.e., the
Hamming distance. Each BWA at the first level is in the revised form shown in Fig.
6(b), and generates an output from the OR-gate tree and several weight bits from the
HA trees. In the interconnection, such outputs are fed into their associated processing
elements at the second level. The output of the OR-gate tree is connected to the
subsequent OR-gate tree at the second level, and the remaining weight bits are
connected to the second level BWAs according to their weights. More precisely, the
bits of weight w are connected to the BWA responsible for w-weight inputs. Each
BWA at the second level is associated with a weight of a power of two that is less
than or equal to Pmax, where Pmax is the largest power of two that is not greater than
rmax + 1. As the weight bits associated with the fourth range are all ORed in the
revised BWAs, there is no need to deal with the powers of two that are larger than

Page 13
Pmax. For example, let us consider a simple (8, 4) single-error correction double-error
detection code. The corresponding first and second level circuits are shown in Fig. 7.
Note that the encoder and XOR banks are not drawn in Fig. 7 for the sake of
simplicity. Since rmax = 2, Pmax = 2 and there are only two BWAs dealing with
weights 2 and 1 at the second level. As the bits of weight 4 fall in the fourth range,
they are ORed. The remaining bits associated with weight 2 or 1 are connected to their
corresponding BWAs. Note that the interconnection induces no hardware complexity,
since it can be achieved by a bunch of hard wiring.Taking the outputs of the preceding
circuits, the decision unit finally determines if the incoming tag matches the retrieved
codeword by considering the four ranges of the Hamming distance. The decision unit
is in fact a combinational logic of which functionality is specified by a truth table that
takes the outputs of the preceding circuits as inputs. For the (8, 4) code that the
corresponding first and second level circuits are shown in Fig. 7, the truth table for the
decision unit is described in Table I. Since U and V cannot be set simultaneously,
such cases are implicitly included in do not care terms in Table I.

Fig. 2.5. First and second level circuits for a (8, 4) code.
The complexity as well as the latency of combinational circuits heavily depends on
the algorithm employed. In addition, as the complexity and the latency are usually
conflicting with each other, it is unfortunately hard to derive an analytical and fully
deterministic equation that shows the relationship between the number of gates and
the latency for the proposed architecture and also for the conventional SA-based
architecture. To circumvent the difficulty in analytical derivation, we present
insteadan expression that can be used to estimate the complexity and the latency by
employing some variables for the nondeterministic parts. The complexity of the
proposed architecture, C, can be expressed as

Page 14
where CXOR, CENC, C2nd, CDU, and CBWA(n) are the complexities of
XOR banks, an encoder, the second level circuits, the decision unit, and a BWA for n
inputs, respectively. Using the recurrence relation, CBWA(n) can be calculated as

where the seed value, CBWA(1), is 0. Note that when a + b = c, CBWA(a) +


CBWA(b) ≤ CBWA(c) holds for all positive integers a, b, and c. Because of the
inequality and the fact that an OR-gate tree for n inputs is always simpler than a BWA
for n inputs, both CBWA(k) + CBWA(n –k) and C2nd are bounded by CBWA(n).
The latency of the proposed architecture, L, can be expressed as

where LXOR, LENC, L2nd, LDU, and LBWA(n) are the latencies of an XOR
bank, an encoder, the second level circuits, the decision unit, and a BWA for n inputs,
respectively. Note that the latencies of the OR-gate tree and BWAs for x ≤ n inputs at
the second level are all bounded by log2 n . As one of BWAs at the first level finishes
earlier than the other, some components at the second level may start earlier.
Similarly, some BWAs or the OR-gate tree at the second level may provide their
output earlier to the decision unit so that the unit can begin its operation without
waiting for all of its inputs. In such cases, L2nd and LDU can be partially hidden by
the critical path of the preceding circuits, and L becomes shorter than the given
expression.

2.2 EXCLUSIVE OR

We know Gate, the OR Gate and the NOT Gate, we can build many other
types of logic gate functions, such as a NAND Gate and a NOR Gate or any other
type of digital logic function. But there are two other types of digital logic gates
which although they are not a basic gate in their own right as they are constructed by
combining together other logic gates, their output Boolean function is important
enough to be considered as complete logic gates.

Page 15
These two ―hybrid‖ logic gates are called the Exclusive-OR (Ex-OR) Gate and its
complement the Exclusive-NOR(Ex-NOR) Gate. Previously, we saw that for a 2-
input OR gate, if A = ―1‖, ORB =―1‖, OR BOTHA + B = ―1‖ then the output from the
digital gate must also be at a logic level ―1‖ and because of this, this type of logic gate
is known as an Inclusive-OR function. The gate gets its name from the fact that it
includes the case of Q = ―1‖ when both A and B = ―1‖.If however, an logic output ―1‖
is obtained when ONLYA = ―1‖ or when ONLYB = ―1‖ but NOT both together at
the same time, giving the binary inputs of ―01‖ or ―10‖, then the output will be ―1‖.
This type of gate is known as an Exclusive-OR function or more commonly an Ex-Or
function for short. This is because its boolean expression excludes the ―OR BOTH‖
case of Q = ―1‖ when both A and B = ―1‖.In other words the output of an Exclusive-
OR gate ONLY goes ―HIGH‖ when its two input terminals are at ―DIFFERENT‖
logic levels with respect to each other.An odd number of logic ―1‘s‖ on its inputs
gives a logic ―1‖ at the output. These two inputs can be at logic level ―1‖ or at logic
level ―0‖ giving us the Boolean expression of: Q = (A B) = A.B + A.B The
Exclusive- OR Gate function, or Ex-OR for short, is achieved by combining standard
logic gates together to form more complex gate functions that are used extensively
in building arithmetic logic circuits, computational logic comparators and error
detection circuit s.

The two-input ―Exclusive-OR‖ gate is basically a modulo two adder, since it


gives the sum of two binary numbers and as a result are more complex in design than
other basic types of logic gate. The truth table, logic symbol and implementation of a
2-input Exclusive-OR gate is shown below.

Symbol Truth Table

B A Q

0 0 0

0 1 1

2-input Ex-OR Gate 1 0 1

1 1 0

Boolean Expression Q = A B A OR B but NOT BOTH gives Q

Fig 2.6 2-input Ex-OR Gate

Page 16
Page 16
The truth table above shows that the output of an Exclusive-OR gate ONLY
goes ―HIGH‖ when both of its two input terminals are at ―DIFFERENT‖ logic levels
with respect to each other. If these two inputs, A and B are both at logic level ―1‖ or
both at logic level ―0‖ the output is a ―0‖ making the gate an ―odd but not the even
gate‖.

This ability of the Exclusive-OR gate to compare two logic levels and produce
an output value dependent upon the input condition is very useful in computational
logic circuits as it gives us the following Boolean expression of:

Q = (A B) = A.B + A.B

The logic function implemented by a 2-input Ex-OR is given as either: ―A OR


B but NOT both‖ will give an output at Q. In general, an Ex-OR gate will give an
output value of logic ―1‖ ONLY when there are an ODD number of 1‘s on the inputs
to the gate, if the two numbers are equal, the output is ―0‖.

Then an Ex-OR function with more than two inputs is called an ―odd
function‖ or modulo-2-sum (Mod-2-SUM), not an Ex-OR. This description can be
expanded to apply to any number of individual inputs as shown below for a 3-input
Ex-OR gate.

Exclusive-OR Gates are used mainly to build circuits that


performarithmetic operations and calculations especially Adders and Half-Adders
as they can provide a ―carry-bit‖ function or as a controlled inverter, where one
input passes the binary data and the other input is supplied with a control signal

Fig 2.7 Ex-OR Function Realisation using NAND gates

2.3 BINARY ADDITION

Binary Addition follows these same basic rules as for the denary
additionabove except in binary there are only two digits with the largest digit being
―1‖. So when adding binary numbers, a carry out is generated when the ―SUM‖
equals or is greater than two (1+1) and this becomes a ―CARRY‖ bit for any
subsequent addition
Page 17
being passed over to the next column for addition and so on. Consider the single bit
addition below.

Binary Addition of Two Bits

0 0 1 1

+0 +1 +0 +1

0 1 1 (carry) 1←0

When the two single bits, A and B are added together, the addition of ―0 + 0‖,
―0 + 1‖ and ―1 + 0‖ results in either a ―0‖ or a ―1‖ until you get to the final column of
―1 + 1‖ then the sum is equal to ―2‖. But the number two does not exists in binary
however, 2 in binary is equal to 10, in other words a zero for the sum plus an extra
carry bit.

Then the operation of a simple adder requires two data inputs producing two
outputs, the Sum (S) of the equation and a Carry (C) bit as shown.

For the simple 1-bit addition problem above, the resulting carry bit could be
ignored but you may have noticed something else with regards to the addition of these
two bits, the sum of their binary addition resembles that of an Exclusive-OR Gate. If
we label the two bits as A and B then the resulting truth table is the sum of the two
bits but without the final carry.

Fig 2.8 Binary Adder Block Diagram

For the simple 1-bit addition problem above, the resulting carry bit could be
ignored but you may have noticed something else with regards to the addition of these
two bits, the sum of their binary addition resembles that of an Exclusive-OR Gate.If
we label the two bits as A and B then the resulting truth table is the sum of the two
bits but without the final carry.

Page 18
2-input Exclusive-OR Gate

Symbol Truth Table

B A S

0 0 0

0 1 1

2-input Ex-OR Gate 1 0 1

1 1 0

Fig 2.9 2-input Exclusive-OR Gate

We can see from the truth table above, that an Exclusive-OR gate only
produces an output ―1‖ when either input is at logic ―1‖, but not both the same as for the
binary addition of the previous two bits. However in order to perform the addition of
two numbers, microprocessors and electronic calculators require the extra carry bit to
correctly calculate the equations so we need to rewrite the previous summation to
include two-bits of output data as shown below.

00 00 01 01

+ 00 + 01 + 00 + 01

00 01 01 10

2-input AND Gate

Symbol Truth Table

B A C

0 0 0
0 1 0
1 0 0
1 1 1

Fig 2.10 -input AND Gate

Page 19
By combining the Exclusive-OR gate with the AND gate results in a simple
digital binary adder circuit known commonly as the ―Half Adder‖ circuit.

2.4 HALF ADDER


A half adder is a logical circuit that performs an addition operation on two
binary digits. The half adder produces a sum and a carry value which are both binary
digits.
Half Adder Truth Table with Carry-Out

Symbol Truth Table

B A SUM CARRY

0 0 0 0

0 1 1 0

1 0 1 0

1 1 0 1

Fig 2.11 Half Adder Truth Table with Carry-Out

For the SUM bit ⊕


SUM = A XOR B = A B

For the CARRY bit

CARRY = A AND B = A.B

From the truth table of the half adder we can see that the SUM (S) output is
the result of the Exclusive-OR gate and the Carry-out (Cout) is the result of the AND
gate. Then the Boolean expression for a half adder is as follows.

One major disadvantage of the Half Adder circuit when used as a binary adder
is that there is no provision for a ―Carry-in‖ from the previous circuit when adding
together multiple data bits. For example, suppose we want to add together two 8-bit
bytes of data, any resulting carry bit would need to be able to ―ripple‖ or move across
the bit patterns starting from the least significant bit (LSB). The most complicated
operation the half adder can do is ―1 + 1‖ but as the half adder has no carry input the
resultant added value would be incorrect. One simple way to overcome this problem is
to use a FullAddertype binary adder circuit.

Page 20
CHAPTER 3

DESIGNING FOR LOW-POWER CONSUMPTION

3.1 INTRODUCTION

If we want a design for low-power consumption to be successful, it is


important to have a thorough understanding of the sources of power dissipation, the
factors that affect them, and the methodologies and techniques that are available to
achieve optimal results. Therefore, this thesis starts with a literature study in low-
power and power-aware design. We present —we believe— the most important low-
power methodologies and power optimization techniques available. Low-power
design can be applied on various different levels, such as the architectural level, the
gate level, and the technology level. Apart from that, also a number of alternative
logic-design styles are presented to reporton their characteristics regarding power
consumption. This chapter could as well beutilized by others as a quick study in the
field of low-power/power-aware design.

3.2 LOW POWER DESIGN Vs POWER AWARE DESIGN

Nowadays, low-power design is a term that has become familiar to probably


everyone in the engineering field, because of the simple fact that we want to do the
same amount of work (or even more) with less power. Another design methodology
exists, which employs low-power design techniques, called power-aware design.
These methodologies are, however, not the same. Power-aware design is sometimes
confused with low-power design, but there is an elementary difference in the point of
view. Low-power design is a design methodology which focuses on minimizing
power of ascertains circuit. Since low-power design techniques may severely affect
the performance of a circuit (as will be explained later), a minimal performance
constraint can be set.

This means that the power consumption will be minimized without violating
this performance constraint. Power-aware systems are typically systems that have
limited power budgets but provide a respectable performance as well. When these
circuits are designed, low-power consumption is of importance as well, but the point
of view is different. In power-aware design the performance is maximized subject to a
power budget

3.3 SOURCES OF POWER CONSUMPTION

There are a number of sources of power consumption in CMOS, which can be


subdivided into static and dynamic power dissipation. Dynamic power dissipation is
primarily caused by the switching of the CMOS devices (MOSFETs) when logic
values are changed (known as capacitive power or switching power). The amount of
Power that is dissipated is directly related to the switching activity (which is the
number of logic transitions per clock cycle in the entire circuit), the clock frequency,
the supply voltage, and the capacitive load the circuit drives.

Another source of dynamic power dissipation is short-circuiting power.


CMOS is comprised of both PMOS and NMOS devices. During a logic transition, the
PMOS and NMOS devices are simultaneously turned on for a very short period of
time, allowing a short-circuit current to run formed to ground. In real life there will
always be leakage currents, since CMOS devices are not perfect switches. The total
power dissipation can be described by the following formula: total = _(CL · VDD2 ·
fclk) + Isc· VDD + Ileak· VDD (1)The Greek letter _ represents the switching activity
in the circuit, expressed in a value between 0 (no switching activity at all) to 1
(maximum switching activity). CL is the capacitive load, driven by the circuit. As can
been seen in the formula, the dynamic power consumption depends on VDD square,
which makes the supply voltage a very important factor in low-power design, as will
be explained later. Isc and Ileak represent the total short circuit and leakage current,
respectively. It is important to realize that that Isc is is a variable, while Ileak is not.
Isc depends on the charge carried by the short-circuit per transition, the cycle time,
and the total number of transitions [11]:Isc = Qsc· f · _ (2)A better way to represent
the formula would be like this: total = _(CL · VDD2 · fclk + Qsc· fclk· VDD) +
Ileak· VDD (3)It is a misunderstanding that reducing power consumption will always
lead to reduction in energy as well.

3.4 BASIC LOW-POWER DESIGN METHODOLOGIES

The following methodologies are the most powerful ones and applicable to
virtually every system. They include static voltage scaling, frequency scaling, and
various other kinds of voltage scaling (sometimes combined with frequency scaling),
clock gating and power gating. Finally, a section is dedicated to technology scaling.

3.4.1 Static Voltage Scaling

One way to decrease the power consumption significantly is to decrease the


supply voltage. As mentioned in section 2.3, dynamic power consumption depends
quadratically on VDD. Voltage scaling is therefore the most effective method to limit
the power consumption. However, when VDD is lowered, it comes at a price: the
delay of the logic increases. In systems where we desire a high throughput, and ask
for the maximum performance of the technology being utilized, voltage scaling is not
an option. If VDD would be lowered, we would not be able to meet the performance
requirements. In many situations however, we do not ask for the maximum
performance and we can safely lowerVDD in order to save power

Fig 3.1 Graph displaying delay versus VDD

However, in the next paragraph will be discussed how easily this process can
be slowed down. Operating at the normal supply voltage theoretically assures correct
operation for a certain period of time, but if we require high reliability and need to
depend on the technology for long periods of time (such as in biomedical implants), it
is not only wise to reduce VDD for power-saving purposes but also for increasing the
reliability. A small reduction in the supply voltage can already substantially diminish
the device degradation over time, since device aging is nearly exponentially
dependent on VDD.

3.4.2 Frequency scaling

Apart from VDD, there is another variable in the equation of section 2.3 that
intuitively suggests possibilities for reducing power: frequency. Of course, the clock
frequency inbounds by the desired throughput of the system. However, a system does
rarely operate at its maximum throughput all the time. Often, the desired throughput is
much lower than its maximum performance. Then, it is possible to lower the
operating frequency in order to save power (and thus also energy), which is called
frequency scaling. Sometimes voltage scaling and frequency scaling are employed
simultaneously, such as in cell phones, when they are in stand-by mod.

3.4.3 Multi-VDD and CVS

A system is often comprised of various subsystems or components. If one or


multiple components require maximum performance and cannot permit voltage
scaling, it does not mean that the entire system does not apply for voltage scaling. It is
possible to subdivide the system in blocks, having their own different supply voltage:
high-VDD(normal VDD) or low-VDD. These supply voltages are, however, fixed.
Therefore this technique is still referred to as static voltage scaling. There appears,
however, a problem when Multi-VDD is employed. Signals cross from low-VDD to
high-VDD blocks and vice versa. Crossing from high-VDD to low-VDD does not
Result in any problems, but signals coming from a low-VDD block driving logic on a
high-VDD block results in problems. A low-VDD output signal driving a high-VDD
circuit may lead to problems with cutting off the PMOS transistors (as PMOS
transistors may get stuck in triode mode)2, which causes a static current flowing from
VDD to ground in the high-VDD circuit and even when low-VDD is high enough to
cut off the PMOS transistors, there will be an increase in the rise and fall times at the
receiving inputs, leading to higher switching currents and slower transition times.
Slower transition times will ultimately cause the short circuit current to last longer
than necessary. The solution for these problems is level shifters (a.k.a. level
converters). These are devices that can be placed in between the signals crossing from
a low-VDD to a high-VDD block. The level shifter will lift the low-voltage signal to
the level that is appropriate for the high-VDD powered block. Note that these
components have a certain cost, and when many level shifters are implemented, they
may contribute significantly to the total area and dynamic power consumption.
Therefore, the total number of blocks should be limited, since too many level shifters
may cancel the power savings. Another approach is CVS: Clustered Voltage Scaling.
This technique avoids the implementation of many level shifters (and thus power) by
clustering all critical and on-critical paths in only two separate clusters, one powered
by low-VDD, the other by high-VDD. Before CVS is applied, a lot of level shifters
can reside along a path from an input to an output of a circuit if a high-VDD circuit is
connected to a low-VDD circuit, which is in turn connected to a high-VDD circuit, etc

3.4.4 Dynamic Voltage Scaling

Instead of a fixed supply voltage during circuit operation it is also possible to


dynamically adjust the supply voltage based on the current required performance of
the circuit. The required performance is often not always the same. When the required
performance of the circuit is momentarily reduced, we can afford lower supply
voltages. This is called dynamic voltage scaling (DVS). A feedback control is
required to control the voltage based on the required performance.

Ideally, we want access to an infinite amount of supply voltages, such that the
optimal voltage can be chosen. In reality, this is impossible, and we have to work with
a large, but limited number of voltages. Apart from lowering the supply voltage, it is
also possible to lower the clock frequency. A reduction in VDD wills always
increases the delay to some extent, but if the cycle time is still much higher than the
delay of the circuit, energy is wasted. Therefore dynamic voltage and frequency
scaling (DVFS) can be employed to save energy to the maximum.

A variable supply voltage scheme, divided in to three elementary parts: the


speed detector, the timing controller and the buck converter, as depicted in figure 2.2.
The timing controller contains two critical path replicas (CPR and CPR+), both
powered by the reduced supply voltage VDDL In this case the counter operates at 1
MHz, meaning that every 1 μs the counter can be increased/decreased. The counter‘s
output is an integer N from 0 to 64, which is an
indicator for the value of the supply voltage VDDL. The buck converter is comprised
of a duty controller (clocked by a 64 MHz ring oscillator), a CMOS inverter, and a
low-pass LC-filter. The buck converter is the part that creates the VDDL from VDD.
The idea is as follows. Within every consecutive 1 μs time span, the duty control can
turn VDD on for a period of time X (PMOS is on) and off (NMOS is on) for a period
of time Y.

Fig 3.2 Variable supply voltage scheme

The ratio X/Y is determined by the integer N. Since the duty controller runs at
64 MHz (which is 64 times faster than the timing controller), and N can represent 64
numbers, we are able to create64 different values of VDDL. For example, if N=32,
then VDD is turned on for 0.5 μs and for the remaining 0.5 μs the value is zero. Then
the average value of VDDL is 0.5 · VDD. The low-pass filter3 is placed off-chip.
Finally, there is the feedback loop back to the speed detector. If VDD is 1 Volt, which
is common in 90nm CMOS, this variable supply voltage scheme can provide a
resolution of 15.6 mV (meaning that VDDL can be varied in steps as small as 15.6
mV). Obviously, the larger the frequency of the buck converter (and the range of the
counter), the larger the resolution, and the closer VDDL will be to the optimal value.
It appears that the external frequency fext assumes some predefined values (based on
different performance requirements), such that VDDL can be fine-tuned for the
specific frequency.

The reason why this scheme is presented in such close details mainly to
provide a deeper insight in how dynamic voltage scaling exactly works, but also to
show that a significant amount of overhead is required for this technique. Multi-level
voltage scaling is a form of dynamic voltage scaling and essentially an extension of
static voltage scaling. Based on the required performance, the supply voltage can be
scaled between a small number of fixed and discrete voltage levels. The advantage of
Multi-level voltage scaling is that it is a significantly less expensive power scheme
than DVS with a virtually infinite number of supply voltages.
The Dual Variable Supply-voltage scheme (Dual-VS) is a combination
between DVS and clustered voltage scaling (CVS). First, the circuit is clustered into a
high-VDD and a low-VDD cluster. Both supply voltages are variable, and —in
contrast with multi-level voltage scaling— non-discrete. The minimal voltage is
controlled by loops for both the high and low supply voltages.

Fig 3.3 Voltage dithering compared to other approaches

3.4.5 Voltage dithering

Assume that we, for example, require a rate (normalized frequency) of 0.5 for
a certain period of time. If we do not implement a technique which allows us to
dynamically adjust parameters as voltage and frequency, and the circuit always
operates at the maximum rate of 1, a dramatic amount of the total dissipated energy is
wasted energy. When utilizing DVFS, fclk is lowered (in this case by 50%) and since
a lower frequency is required also the supply voltage can be decreased However, also
DVFS has an important drawback: access to a vast (ideally infinite) amount of
different supply voltages requires a significant amount of hardware overhead.

A solution to this problem has been presented by multi-level voltage scaling,


utilizing a small number of discrete voltages, but this introduces anew problem. With
not that many discrete voltages, the selected voltage will probably not be optimal,
which results in a higher energy consumption than strictly necessary. A solution to
both problems is Voltage Dithering. In voltage dithering, only two discrete
voltage/frequency pairs are utilized voltage dithering is lower than for DVS. This can
be explained by the fact that only two discrete supply voltages (and frequencies) are
required, which makes the power scheme less complex.

3.4.6 Clock gating

In the previous sections we have referred to (sub) systems that do not always
operate at their maximum performance. It is also possible that parts of a system are
idle for period of time: then no useful computational work is performed. Still, there
is power consumed. A subsystem being idle does not necessarily mean that the
subsystem is nonperforming any computations. It only means that the results are not
being utilized. This is possible when the subsystem is still fed with data, but the result
is discarded, because it is not needed at that moment If there are large registers
presenting these subsystems, this power dissipation can become quite significant.
And, finally, there is the power dissipation of the clock network in the subsystem.

Fig: 3.4Clock gating mechanism

Clock networks are very expensive in terms of power. A major portion of the
total power consumption of the system is dissipated in the clock network (mainly in
the clock buffers/drivers). Considering the above, there is a lot of power that can be
saved when a subsystems idle. One way to achieve this goal is to apply clock gating.
This essentially means that the clock signal of the subsystem is cut off. This will save
the power dissipated in flip-flops and the clock network. If the combinational logic in
the subsystem is fed by registers at the inputs, the logic will stop switching. It will,
however, not save the leakage power.

For clock gating an additional signal is required: a clock-enable signal. Clock


gating is a very simple approach and can be automatically implemented by a synthesis
tool. There are essentially two ways of implementing clock gating: flip-flop-free and
flip-flop-based clock gating. Flip-flop-free clock gating is implemented by a simple
AND-gate. This works fine, as long as the clock-enable signal is stable in between the
rising edges of the clock. If it is not, additional clock pulses can be generated or the
clock gating can be terminated prematurely. A better approach is to insert a flip-flop
for the clock-enable signal to avoid these problems. Both methods of clock gating are
depicted in figure 4.4. Thus, clock gating requires some additional logic, but the costs
are low.

Clock gating is not only applicable to large subsystems, it can as well be


applied to simple registers that do not need to be updated every clock cycle In some
Occasions, clock gating a few registers is sufficient to disable an entire subsystem.
For example, when the input registers of an ALU are clock gated, the entire ALU can
be ―turned off‖ (desired in the case when the ALU is computing in vain, since the
results are not utilized). However, clock gating a register smaller than 3 bits is not
efficient, considering the overhead of the clock gating mechanism.

3.4.7 Power gating

Power gating has however an important advantage over clock gating: it is


capable to save static power of idle blocks as well, since it cuts of the power supply
instead of the clock signal. In order to do that, blocks need to be placed onto separate
‖power islands‖, which can be powered on and off. In reality, low-leakage PMOS
transistors (called switches) are placed in between every connection to VDD of the
block, to create virtual power network that can be turned on and off. These switches
are controlled by power gating controllers. For these low-leakage PMOS transistors
often high-VTH transistors (the higher VTH, the lower the leakage current) are
employed.

Transistors with low-VTH are suitable for high performance, but not for low-
leakage, and vice versa. Therefore, the transistors in the circuit have low threshold
voltages and the switches have high voltage thresholds. This is called a dual-threshold
voltage technology or MTCMOS (multi-threshold CMOS). These high-VTH
transistors do, however, cause a problem. Since VDD is low and VTH is relatively
high, these transistors will be slow. In order to speed them up, we would need to
resize. When a block is asleep it costs some time to wake it up again, the same as it
costs some time to put a block to sleep.

This introduces additional delays. Also during wake-up and going to sleep,
still some leakage power is dissipated which makes power gating not perfect. The
essential criteria for implementing power gating is the total leakage power component
and how many and how often blocks are idle. The leakage power highly depends on
the technology being utilized and the impact of the leakage power highly depends on
the system frequency being utilized. If the leakage components significant and many
blocks are idle for longer periods of time, power gating maybe efficient. One should
however be aware of the fact that power gating is much more difficult to implement
than clock gating and leads to significantly higher costs (mainly because of all the
switches that are required). It is also important to realize that power gating is much
more invasive than clock gating. While clock gating does not affect the functionality
of the system, power gating does. It affects inter-block communication and, as
mentioned before, adds time delays to safely enter and exit power gated modes.

Page 28
3.4.8 Technology Scaling

Another way to save energy is to improve and scale the technology. Over the
last decades CMOS technology has improved and scaled from 10μm in the early
1970‘s to32nm in 2010.Sizes as small as 11 nm are expected around the year 2015.
Ideally, the voltages, electric fields, and linear dimensions remain constant with the
scaling factor' as explained in section 2.4.1. Therefore, the energy savings scale with
'3 (VDD and the capacitance of the transistors scales with ', where power/energy has a
quadratic dependence on VDD). In reality it is difficult to scale VTH along with
VDD.

3.5 METHODOLOGIES AT ARCHITECTURAL LEVEL

Two methodologies at architectural level are presented, namely parallelization


and pipelining. Note that in both cases voltage scaling is required as well, and these
methodologies are not applicable to every circuit, since it depends on the
implementation of the architecture and constraints of the system.

3.5.1 Parallelization

Parallelization is essentially a technique where we trade power consumption


for area. If we take an ALU as an example, with a delay of T seconds, operating at a
clock speed of1/T Hz, at its maximal (normal) voltage VDD (Vmax). Now, if we
would reduce VDD toa value Vow, such that the delay of the ALU is no longer T but
2T (which allows a VDD reduction of roughly 40%), we are not able to run at 1/T Hz
anymore. However, if were duplicate the ALU and feed both outputs to a multiplexer,
as depicted still able to deliver the same performance For convenience, division by
two and multiplication by _ can be cancelled out. This leads us to the following upper
boundary for power saving by parallelization:

3.5.2 Pipeling

Another effective technique to reduce power is pipelining. Again we take an


ALU as an example, with a delay of T seconds, operating at a clock speed of 1/T Hz,
at its maximal (normal) voltage VDD (Vmax). If we can somehow subdivide the
ALU in blocks and we place a pipeline register in between (depicted in figure 2.6), we
decrease the latency of the ALU from T to 2T, but this does not have to be a problem
since every cycle a result can be produced. Now the critical path in both blocks is
much shorter (say, exactly half what it was before), we are allowed to decrease the
supply voltage.

This results in the following: Ppipe = Cpipe· V 2pipe · fpipe (9)Ppipe = (_Cref
)(Vref )2fref (10)Ppipe =O(2Pref ) (11)So, also here, the power savings are upper
bounded by the supply voltage reduction. The total capacity being switched has been
increased slightly because of the extra pipeline registers, so be slightly lengthen one.

Page 29
The supply voltage can be reduced to the same extent as in parallelization.
This technique is, however, much more interesting for designs with limited area
budgets. However, if there is a feedback looping the circuit, pipelining cannot be
employed. Note that pipelining and parallelization can also be employed
simultaneously to obtain even larger power savings. The same upper bound holds true
for this methods mentioned, but can be even smaller, since the critical path has now
been reduced to 4T instead of 2T. A voltage reduction of approximately 60% ( =
0.4,the point where the delay has quadrupled) is now possible. Note that the relation
between VDD and delay may vary between different types of technology, so the
numbers of we presented serve only as an indication.

Fig 3.5 Pipelining principle

3.6 OPTIMIZATIONS AT GATE LEVEL

At lower abstraction levels, techniques can also be applied to decrease the


power consumption. At gate level, these techniques focus on optimization of the net
list in order to reduce power. We present three important optimization techniques:
path balancing, high-activity net remapping, and fan-in decomposition.

3.6.1 Path balancing

Spurious transitions are a significant problem in combinational designs, since


the portion of the total power consumption of a combinational circuit that is caused by
spurious transitions can be as high as 10 to 40 percent [11]. Spurious transitions are
useless transitions, since they do not contribute to the real computation and thus the
power that is dissipated during these transitions is a waste of power. Their occurrence
is caused by timing differences between different paths leading to the same logic
element. For example, when a two-input XOR-gate does not receive both inputs
simultaneously, but with a certain delay between input 1 and 2, a spurious transition
may occur. This example is depicted in figure 2.7. On the left side of the figure (a),
the paths 1 and 2are assumed to have no delay. The signals on inputs A and B of the
gate switch on the exact same time, such that the output signal Z remains unaltered.
On the right side (b),

Page 30
DEPT OF ECE GIST

Figure 3.6: Example of a spurious transition due to unequal path delays

Table 3.1 Input capacitance of a 4-input AND gate

The paths do have a certain delay, as they will have in a real life situation. Path 1
has however a slightly longer delay than path 2, causing a spurious transition in
output Z.These undesired transitions can be eliminated by path balancing (a.k.a. path
equalization), a technique that makes sure that the delay of all paths that converge at
each gate is about equal [11, 21]. This can be done by inserting unit-delay buffers in
the paths that are shorter than the others.

3.6.2 High Activity Net Mapping

Since an n-input gate (e.g. a 4-bit AND-gate) can have significant differences
in input capacitance between its various inputs, it is wise to connect the net with the
highest switching activity to the input pin with the lowest input capacitance and vice
versa, to reduce power (the higher the capacitive load, the higher the power
dissipation). For example, the input capacitance of a 4-bit AND-gate in UMC 90nm

Page 31
DEPT OF ECE GIST

Technology can be observed. High input capacitances result in slow logic and high
power consumption It is better to decompose a gate with a high fan-in into a network
of multiple gates with a low fan-in, which significantly reduces the total capacitance.
For example, in the UMC 90nm library, gates have a maximum fan-in of four inputs.
Typically, the synthesis tool takes care of these optimizations. For example, a 16-bit
AND-gate implemented in VHDL, will be decomposed by the synthesis tool and
implemented by e.g. a tree of 4-bitAND-gates.

3.7 OPTIMIZATIONS AT TECHNOLOGY LEVEL

At the lowest abstraction level, the technology (or transistor) level, a number
of optimizations can be performed in order to reduce power consumption. Whenno
low-power methodologies have been applied to the circuit, optimizations at the
technology level will not be necessary. But if the supply voltage is altered and the
delay of the circuit is compromised by low-power design methodologies,
optimizations at technology level may be desired. Optimizations at technology level
include adjusting the threshold voltage of the transistors, and/or altering their sizes.

3.7.1 Resizing Transistors


One should know that the standard size of the transistors (MOSFETs) is based
on the normal, maximum supply voltage. When VDD is reduced, the size of the
transistors isno longer optimal.Cref is the gate capacitance of a transistor with the
smallest possible/L-ratio (N=1). If P=0, then energy is indeed linearly dependent on N.
If P > 0 thisis no longer the case. For higher values of P, we can afford to increase the
size of the transistors and thus decrease the delay of the transistors, without increasing
the energy consumption. In fact, the energy consumption first decreases for increasing
N, and then starts increasing again. This means that for every value of P,there is an
optimal value for N. Note that for P=0.5 in figure 2.9, the optimal N is still 1, since E
does not decrease when N increases. The benefits of resizing are that we can
compensate for the speed loss due to the VDD reduction. By increasing the transistor
size, the speed of the transistors goes up again. For relatively small values of P, the
energy consumption will increase, but E only increases sub-linearly with N.

3.7.2 Optimizing the VDD/VTH ratio

As mentioned before in section 2.4.1, voltage scaling is limited by the


threshold voltage of the CMOS devices. The closer VDD gets to VTH, the larger the
delay penalty will be. If a significant reduction in VDD is required, but this results in
unacceptable delays (when VDD < 4VTH), it is possible to decrease VTH as well in
order to keep VDD > 4VTH, and thus keep the delay penalty acceptable. However,
we have to be aware that the threshold voltage is also an important factor in the
leakage energy. Even a small reduction in VTH leads to a significant increase in
Eleak, since

Page 32
DEPT OF ECE GIST

Eleak has an exponential dependence on VTH.So there is a limit to VDD/VTH


scaling. An optimum for a given fclk exists, and is reached when Edynamic≈ Eleak
Others claim the optimum at Dynamic≈ 2Elex

Fig 3.7 Energy consumption versus scaling factor N for various values of P

In conclusion, simultaneously optimizing VDD, VTH, and transistor size will lead to
the optimal result. By not only reducing VDD, but also optimizing VTH and
transistorize it is possible to achieve significant energy savings without compromising
the speed of the circuit.

3.8 DIFFERENT DIGITAL LOGIC STYLES

Standard CMOS is the most common and widely utilized digital logic in
almost any application field. Still, other digital logic styles exist and are utilized in the
industry as well. Since we aim in this thesis work for special design characteristics,
such as very low power consumption and very small sized designs, it is fair to have a
look at other digital logic design styles as well. First the fundamental differences
between static and dynamic logic will be explained. Then, the three most interesting
alternative digital logic styles are discussed.

3.8.1 Static Vs Dynamic Logic

In static logic circuits the clock signal is only utilized for memory cells (flip-
flops). Pure combinational circuits do not need a clock signal at all. In dynamic logic,
all cells are clocked (this is the reason why dynamic logic is also referred to as
clocked logic), even if the circuit is purely combinational. This may seem odd,

Page 33
DEPT OF ECE GIST

especially given the fact that theclock network is one of the largest energy consumers
in almost any design, but dynamiclogic provides a number of advantages. Dynamic
logic is actually commonly utilized incomputer memories nowadays. All types of
DRAM is dynamic logic (Dynamic RandomAccess Memory). Another well-known
application of dynamic logic is domino logic

Fig: 3.8 AND-gate implemented in static and dynamic logic

Page 34
DEPT OF ECE GIST

CHAPTER 4
VERILOG

4.1 OVERVIEW

Hardware description languages such as Verilog differ from


software programming languages because they include ways of describing the
propagation of time and signal dependencies (sensitivity). There are two assignment
operators, a blocking assignment (=), and a non-blocking (<=) assignment.

The non-blocking assignment allows designers to describe a state-machine


update without needing to declare and use temporary storage variables (in any general
programming language we need to define some temporary storage spaces for the
operands to be operated on subsequently; those are temporary storage variables).
Since these concepts are part of Verilog's language semantics, designers could quickly
write descriptions of large circuits in a relatively compact and concise form.

At the time of Verilog's introduction (1984), Verilog represented a tremendous


productivity improvement for circuit designers who were already using graphical
schematic capturesoftware and specially-written software programs to document and
simulate electronic circuits.

The designers of Verilog wanted a language with syntax similar to the


Cprogramming language, which was already widely used in engineering software
development. Verilog is case-sensitive, has a basic preprocessor (though less
sophisticated than that of ANSI C/C++), and equivalent controlflowkeywords (if/else,
for, while, case, etc.), and compatible operator precedence.
Syntactic differences include variable declaration (Verilog requires bit-widths on
[clarification needed]
net/regtypes ), demarcation of procedural blocks (begin/end instead of
curly braces {}), and many other minor differences.

A Verilog design consists of a hierarchy of modules. Modules encapsulate


design hierarchy, and communicate with other modules through a set of declared
input, output, and bidirectional ports. Internally, a module can contain any
combination of the following: net/variable declarations (wire, reg, integer, etc.),
concurrent and sequential statement blocks, and instances of other modules (sub-
hierarchies).

Page 35
Sequential statements are placed inside a begin/end block and executed in
sequential order within the block. But the blocks themselves are executed
concurrently, qualifying Verilog as a dataflow language.

Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0,


floating, undefined") and strengths (strong, weak, etc.). This system allows abstract
modeling of shared signal lines, where multiple sources drive a common net. When a
wire has multiple drivers, the wire's (readable) value is resolved by a function of the
source drivers and their strengths.

A subset of statements in the Verilog language is synthesizable. Verilog


modules that conform to a synthesizable coding style, known as RTL (register-transfer
level), can be physically realized by synthesis software. Synthesis software
algorithmically transforms the (abstract) Verilog source into a net list, a logically
equivalent description consisting only of elementary logic primitives (AND, OR,
NOT, flip-flops, etc.) that are available in a specific FPGA or VLSI technology.
Further manipulations to the net list ultimately lead to a circuit fabrication blueprint
(such as a photo mask set for an ASIC or a bit stream file for an FPGA).

4.2. HISTORY

4.2.1. Beginning

Verilog was the first modern hardware description language to be invented. It


was created by Phil Moor by and PrabhuGoelduring the winter of 1983/1984. The
wording for this process was "Automated Integrated Design Systems" (later renamed
to Gateway Design Automation in 1985) as a hardware modeling language. Gateway
Design Automation was purchased by Cadence Design Systems in 1990.

Cadence now has full proprietary rights to Gateway's Verilog and the Verilog-
XL, the HDL-simulator that would become the de-facto standard (of Verilog
logicsimulators) for the next decade. Originally, Verilog was intended to describe and
allow simulation; only afterwards was support for synthesis added.

4.2.2. Verilog-95

With the increasing success of VHDL at the time, Cadence decided to make
the language available for open standardization. Cadence transferred Verilog into the
public domain under the Open Verilog International (OVI) (now known as Accellera)
organization. Verilog was later submitted to IEEE and became IEEE Standard 1364-
1995, commonly referred to as Verilog-95.

In the same time frame Cadence initiated the creation of Verilog-A to put
standards support behind its analog simulator Specter. Verilog-A was never intended

Page 36
to be a standalone language and is a subset of Verilog-AMS which encompassed
Verilog-95.

4.2.3. Verilog 2001

Extensions to Verilog-95 were submitted back to IEEE to cover the


deficiencies that users had found in the original Verilog standard. These extensions
became IEEE Standard 1364-2001 known as Verilog-2001.

Verilog-2001 is a significant upgrade from Verilog-95. First, it adds explicit


support for (2's complement) signed nets and variables. Previously, code authors had
to perform signed operations using awkward bit-level manipulations (for example, the
carry-out bit of a simple 8-bit addition required an explicit description of the Boolean
algebra to determine its correct value). The same function under Verilog-2001 can be
more succinctly described by one of the built-in operators: +, -, /, *, >>>.

A generate/end generate construct (similar to VHDL's generate/end generate)


allows Verilog-2001 to control instance and statement instantiation through normal
decision operators (case/if/else). Using generate/end generate, Verilog-2001 can
instantiate an array of instances, with control over the connectivity of the individual
instances. File I/O has been improved by several new system tasks. And finally, a few
syntax additions were introduced to improve code readability (e.g. always @*, named
parameter override, C-style function/task/module header declaration).Verilog-2001 is
the dominant flavor of Verilog supported by the majority of commercial EDA
software packages.

4.2.4. Verilog 2005

Not to be confused with SystemVerilog, Verilog2005(IEEE Standard 1364-


2005) consists of minor corrections, spec clarifications, and a few new language
features (such as the uwire keyword).

A separate part of the Verilog standard, Verilog-AMS, attempts to integrate


analog and mixed signal modeling with traditional Verilog.

4.2.5. System Verilog

SystemVerilog is a superset of Verilog-2005, with many new features


and capabilities to aid design verification and design modeling. As of 2009, the
SystemVerilog and Verilog language standards were merged into SystemVerilog
2009 (IEEE Standard 1800-2009).

The advent of hardware verification languages such as Open Vera, and


Varsity‘s e language encouraged the development of Super log by Co-
DesignAutomation Inc. Co-Design Automation Inc was later purchased by Synopsys.

Page 37
Foundations of Superlog and Vera were donated to Accelerate, which later became
the IEEE standard P1800-2005: SystemVerilog.

In the late 1990s, the Verilog Hardware Description Language (HDL)


became the most widely used language for describing hardware for simulation and
synthesis. However, the first two versions standardized by the IEEE (1364-1995 and
1364-2001) had only simple constructs for creating tests. As design sizes outgrew the
verification capabilities of the language, commercial Hardware Verification
Languages (HVL) such as Open Vera and ewere created. Companies that did not want
to pay for these tools instead spent hundreds of man-years creating their own custom
tools.

This productivity crisis (along with a similar one on the design side) led to the
creation of Accellera, a consortium of EDA companies and users who wanted to
create the next generation of Verilog. The donation of the Open-Vera language
formed the basis for the HVL features of SystemVerilog.Accellera‘s goal was met in
November 2005 with the adoption of the IEEE standard P1800-2005 for
SystemVerilog, IEEE (2005).

The most valuable benefit of SystemVerilog is that it allows the user to


construct reliable, repeatable verification environments, in a consistent syntax, that
can be used across multiple projects Some of the typical features of an HVL that
distinguish it from a Hardware Description Language such as Verilog or VHDL are

 Constrained-random stimulus generation


 Functional coverage
 Higher-level structures, especially Object Oriented Programming
 Multi-threading and interprocess communication
 Support for HDL types such as Verilog‘s 4-state values
There are many other useful features, but these allow you to create test benches at
a higher level of abstraction than you are able to achieve with an HDL or a
programming language such as C.

System Verilog provides the best framework to achieve coverage-driven


verification (CDV). CDV combines automatic test generation, self-checking test
benches, and coverage metrics to significantly reduce the time spent verifying a
design. The purpose of CDV is to:

i. Eliminate the effort and time spent creating hundreds of


tests.
ii. Ensure thorough verification using up-front goal setting.
iii. Receive early error notifications and deploy run-time
checking and error analysis to simplify debugging.

Page 38
4.3 EXAMPLES

Ex1: A hello world program looks like this

Module main;

initial

begin

$display("Hello world!");

$finish;

end

endmodule

Ex2: A simple example of two flip-flops

follows: module toplevel(clock, reset);

inputclock;

input reset;

regflop1;

regflop2;

always@ (posedgeresetorposedgeclock)

if(reset)
begin

flop1 <= 0;

flop2 <= 1;

end
else
begin
flop1 <= flop2;

flop2 <= flop1;

end

end module

Page 39
The "<=" operator in Verilog is another aspect of its being a hardware
description language as opposed to a normal procedural language. This is known as a
"non-blocking" assignment. Its action doesn't register until the next clock cycle.

This means that the order of the assignments are irrelevant and will produce
the same result: flop1 and flop2 will swap values every clock.The other assignment
operator, "=", is referred to as a blocking assignment. When "=" assignment is used,
for the purposes of logic, the target variable is updated immediately.

In the above example, had the statements used the "=" blocking operator
instead of "<=", flop1 and flop2 would not have been swapped. Instead, as in
traditional programming, the compiler would understand to simply set flop1 equal to
flop2 (and subsequently ignore the redundant logic to set flop2 equal to flop1.)

Ex3: An example counter circuit follows:

moduleDiv20x (rst, clk, cet, cep, count, tc);

// TITLE 'Divide-by-20 Counter with enables'

// enable CEP is a clock enable only

// enable CET is a clock enable and

// enables the TC output

// a counter using the Verilog language

parametersize = 5;

parameterlength = 20;

inputrst;//These inputs/outputs represent

inputclk; // connections to the module.

inputcet;

inputcep;

output[size-1:0] count;

outputtc;

Page 40
reg[size-1:0] count;//Signals assigned

// within an always

// (or initial)block

// must be of type reg

wiretc; // Other signals are of type wire

// The always statement below is a parallel

// execution statement that

// executes any time the signals

// rst or clk transition from low to high

always@ (posedgeclkorposedgerst)

if(rst) // This causes reset of the

cntrcount<= {size{1'b0}};

else

if(cet&&cep) // Enables both true

begin

if(count == length-1)count

<= {size{1'b0}};

else

count<= count + 1'b1;

end

// the value of tc is continuously assigned

// the value of the expression

assigntc = (cet&& (count == length-1));

endmodule
Ex4: An example of delays:

rega, b, c, d;

wiree;

always@(b or e)

begin

a = b & e;

b = a | b;

#5 c = b;

d = #6 c ^ e;

end

The always clause above illustrates the other type of method of use, i.e.
the always clause executes any time any of the entities in the list change, i.e. the b or e
change. When one of these changes, immediately a is assigned a new value, and due
to the blocking assignment b is assigned a new value afterward (taking into account
the new value of a.) After a delay of 5 time units, c is assigned the value of b and the
value of c ^ e is tucked away in an invisible store.

Then after 6 more time units, d is assigned the value that was tucked
away. Signals that are driven from within a process (an initial or always block) must
be of type reg. Signals that are driven from outside a process must be of type wire.
The keyword reg does not necessarily imply a hardware register.

4.4 CONSTANTS

The definition of constants in Verilog supports the addition of a width


parameter. The basic syntax is:

<Width in bits>'<base letter><number>

Examples:

∑ 12'h123 - Hexadecimal 123 (using 12 bits)


∑ 20'd44 - Decimal 44 (using 20 bits - 0 extension is automatic)
∑ 4'b1010 - Binary 1010 (using 4 bits)
∑ 6'o77 - Octal 77 (using 6 bits)
4.4.1. Synthesizable Constructs

There are several statements in Verilog that have no analog in real hardware,
e.g. $display. Consequently, much of the language can not be used to describe
hardware. The examples presented here are the classic subset of the language that has
a direct mapping to real gates.

// Mux examples - Three ways to do the same thing.

// The first example uses continuous assignment

wireout;

assignout = sel ? a : b;

// the second example uses a procedure

// to accomplish the same thing.

regout;

always@(a or borsel)

begin

case(sel)

1'b0: out = b;

1'b1: out = a;

Endcase

end

// Finally - you can use if/else in a

// procedural structure.

regout;

always@(a or borsel)

if(sel)

out = a;

else
out = b;

The next interesting structure is a transparent latch; it will pass the input to
the output when the gate signal is set for "pass-through", and captures the input and
stores it upon transition of the gate signal to "hold".

The output will remain stable regardless of the input signal while the gate
is set to "hold". In the example below the "pass-through" level of the gate would be
when the value of the if clause is true, i.e. gate = 1.

This is read "if gate is true, the din is fed to latch_out continuously." Once
the if clause is false, the last value at latch_out will remain and is independent of the
value of din.

EX6: // Transparent latch example

regout;

always@(gate or din)

if(gate)

out = din; // Pass through state

// Note that the else isn't required here. The variable

// out will follow the value of din while gate is high.

// When gate goes low, out will remain constant.

The flip-flop is the next significant template; in Verilog, the D-flop is the
simplest, and it can be modeled as:

regq;

always@(posedgeclk)

q <= d;

The significant thing to notice in the example is the use of the non-blocking
assignment. A basic rule of thumb is to use <= when there is a
posedgeornegedgestatement within the always clause.

A variant of the D-flop is one with an asynchronous reset; there is a


convention that the reset state will be the first if clause within the statement.

regq;
always@(posedgeclkorposedgereset)

if(reset)

q <= 0;

else

q <= d;

The next variant is including both an asynchronous reset and asynchronous set
condition; again the convention comes into play, i.e. the reset term is followed by the
set term.

regq;

always@(posedgeclkorposedgeresetorposedgeset)

if(reset)

q <= 0;

else

if(set)

q <= 1;

else

q <= d;

Note: If this model is used to model a Set/Reset flip flop then simulation
errors can result. Consider the following test sequence of events. 1) reset goes high 2)
clkgoes high 3) set goes high 4) clk goes high again 5) reset goes low followed by 6)
set going low. Assume no setup and hold violations.

In this example the always @ statement would first execute when the rising
edge of reset occurs which would place q to a value of 0. The next time the always
block executes would be the rising edge of clk which again would keep q at a value of
0. The always block then executes when set goes high which because reset is high
forces q to remain at 0.

This condition may or may not be correct depending on the actual flip flop.
However, this is not the main problem with this model. Notice that when reset goes
low, that set is still high. In a real flip flop this will cause the output to go to a 1.
However, in this model it will not occur because the always block is triggered
by rising edges of set and reset - not levels. A different approach may be necessary for
set/reset flip flops.

Note that there are no "initial" blocks mentioned in this description. There is a
split between FPGA and ASIC synthesis tools on this structure. FPGA tools allow
initial blocks where reg values are established instead of using a "reset" signal.

ASIC synthesis tools don't support such a statement. The reason is that an
FPGA's initial state is something that is downloaded into the memory tables of the
FPGA. An ASIC is an actual hardware implementation.

4.5 INITIAL VERSUS ALWAYS

There are two separate ways of declaring a Verilog process. These are
thealways and the initial keywords. The always keyword indicates a free-running
process. The initial keyword indicates a process executes exactly once. Both
constructs begin execution at simulator time 0, and both execute until the end of the
block. Once an always block has reached its end, it is rescheduled (again). It is a
common misconception to believe that an initial block will execute before an always
block. In fact, it is better to think of the initial-block as a special-case of the always-
block, one which terminates after it completes for the first time.

//Examples:

initial

begin

a = 1; // Assign a value to reg a at time 0

#1; // Wait 1 time unit

b = a; // Assign the value of reg a to reg b

end

always@(a or b)// Any time a or b CHANGE, run the process

begin

if(a)

c = b;

else
d = ~b;

end//Done with this block, now return to the top (i.e. the @ event-control)

always@(posedgea)// Run whenever reg a has a low to high change

a <= b;

These are the classic uses for these two keywords, but there are two significant
additional uses. The most common of these is an alwayskeyword without the @(...)
sensitivity list. It is possible to use always as shown below:

always

begin//Always begins executing at time 0 and NEVER stops

clk = 0; // Set clk to 0

#1; // Wait for 1 time unit

clk = 1; // Set clk to 1

#1; // Wait 1 time unit

end//Keeps executing - so continue back at the top of the begin

The always keyword acts similar to the "C" construct while(1) {..} in the sense
that it will execute forever.

The other interesting exception is the use of the initial keyword with the
addition of the forever keyword.

4.7 OPERATORS

Note: These operators are not shown in order of precedence.

Operat Operator Operation performed

~ Bitwise NOT (1's complement)

& Bitwise AND


Bitwise
| Bitwise OR

^ Bitwise XOR
~^ or ^~ Bitwise XNOR

! NOT

&& AND
Logical
|| OR

& Reduction AND

~& Reduction NAND

Reduct | Reduction OR
ion
~| Reduction NOR

^ Reduction XOR

~^ or ^~ Reduction XNOR

+ Addition

- Subtraction

Arithm - 2's complement


etic
* Multiplication

/ Division

** Exponentiation (*Verilog-2001)

> Greater than

< Less than

Relatio >= Greater than or equal to


nal
<= Less than or equal to

== Logical equality (bit-value 1'bX is removed from

!= Logical inequality (bit-value 1'bX is removed from


=== 4-state logical equality (bit-value 1'bX is taken as

!== 4-state logical inequality (bit-value 1'bX is taken as

>> Logical right shift

<< Logical left shift


Shift >>> Arithmetic right shift (*Verilog-2001)

<<< Arithmetic left shift (*Verilog-2001)

Concat {,} Concatenation

Replica {n{m}} Replicate value m for n times

Conditi ?: Conditional

Table: 4.1 Operators

4.8 SYSTEM TASKS


System tasks are available to handle simple I/O, and various design measurement
functions. All system tasks are prefixed with $ to distinguish them from user tasks and
functions. This section presents a short list of the most often used tasks. It is by no means a
comprehensive list.

∑ $display - Print to screen a line followed by an automatic newline.


∑ $write - Write to screen a line without the newline.
∑ $swrite - Print to variable a line without the newline.
∑ $sscanf - Read from variable a format-specified string. (*Verilog-2001)
∑ $fopen - Open a handle to a file (read or write)
∑ $fdisplay - Write to file a line followed by an automatic newline.
∑ $fwrite - Write to file a line without the newline.
∑ $fscanf - Read from file a format-specified string. (*Verilog-2001)
∑ $fclose - Close and release an open file handle.
∑ $readmemh - Read hex file content into a memory array.
∑ $readmemb - Read binary file content into a memory array.
∑ $monitor - Print out all the listed variables when any change value.
∑ $time - Value of current simulation time.
∑ $dumpfile - Declare the VCD (Value Change Dump) format output file name.
∑ $dumpvars - Turn on and dump the variables.
∑ $dumpports - Turn on and dump the variables in Extended-VCD format.

Page 49
DEPT OF ECE GIST

∑ $random - Return a random value.


4.9DIFFERENCE BETWEEN VERILOG & VHDL

Verilog and VHDL are Hardware Description languages that are used to write
programs for electronic chips. These languages are used in electronic devices that do
not share a computer‘s basic architecture. VHDL is the older of the two, and is based
on Ada and Pascal, thus inheriting characteristics from both languages. Verilog is
relatively recent, and follows the coding methods of the C programming language.

VHDL is a strongly typed language, and scripts that are not strongly typed, are
unable to compile. A strongly typed language like VHDL does not allow the
intermixing, or operation of variables, with different classes. Verilog uses weak
typing, which is the opposite of a strongly typed language. Another difference is the
case sensitivity. Verilog is case sensitive, and would not recognize a variable if the
case used is not consistent with what it was previously. On the other hand, VHDL is
not case sensitive, and users can freely change the case, as long as the characters in
the name, and the order, stay the same.

In general, Verilog is easier to learn than VHDL. This is due, in part, to the
popularity of the C programming language, making most programmers familiar with
the conventions that are used in Verilog. VHDL is a little bit more difficult to learn
and program.

VHDL has the advantage of having a lot more constructs that aid in high-level
modeling, and it reflects the actual operation of the device being programmed.
Complex data types and packages are very desirable when programming big and
complex systems, that might have a lot of functional parts. Verilog has no concept of
packages, and all programming must be done with the simple data types that are
provided by the programmer.

Lastly, Verilog lacks the library management of software programming


languages. This means that Verilog will not allow programmers to put needed
modules in separate files that are called during compilation. Large projects on Verilog
might end up in a large, and difficult to trace, file.

1. Verilog is based on C, while VHDL is based on Pascal and Ada.

2. Unlike Verilog, VHDL is strongly typed.

3. Ulike VHDL, Verilog is case sensitive.

4. Verilog is easier to learn compared to VHDL.

Page 50
DEPT OF ECE GIST

5. Verilog has very simple data types, while VHDL allows users to create more
complex data types.

6. Verilog lacks the library management, like that of VHDL.

At present there are two industry standardhardware description languages,


VHDL andVerilog.

• The complexity of ASIC and FPGA designs hasmeant an increase in the number of
specific toolsand libraries of macro and mega cells written ineither VHDL or Verilog.

• As a result, it is important that designersknowboth VHDL and Verilog and that


EDA tools vendorsprovide tools that provide an environmentallowing both languages
to be used in unison.

VHDL (Very high speed integrated circuit HardwareDescription Language) became


IEEE standard 1076in 1987.

• The Verilog hardware description language hasbeen used far longer than VHDL and
has been

used extensively since it was launched by Gatewayn 1983.


• Cadence bought Gateway in 1989 and openedVerilog to the public domain in 1990.
• Verilog became IEEE standard 1364 in December1995
Verilog – abstraction levels

• Verilog supports three main abstraction levels:


– Behavioral level – a system is described byconcurrent algorithm
– Register-transfer level – a system is characterisedbyby operations and transfer of
data between registersaccording to an explicit clock

– Gate level – a system is described by logical links andtheir timing characteristics

Page 51
DEPT OF ECE GIST

Page 52
DEPT OF ECE GIST

Verilog – 8-bit counter example

Basic constructs

Verilog – hierarchy

• Verilog structures which build the hierarchy are:


– modules
– ports

• A module is the basic unit of the model, and it maybe composed of instances of
other modules

Page 53
DEPT OF ECE GIST

Verilog – hierarchy

• the top level module is not instantiated by anyother module


• Example:
module foo;

bar bee (port1, port2);


endmodule

module bar (port1, port2);


...
Endmodule

Verilog

• Port types in Verilog:


– Input

–Output

– Inout

Page 54
DEPT OF ECE GIST

• Matching ports by names:

foo f1 (.bidi(bus), .out1(sink1), .in1(source1));


versus normal way

foo f1 (source1, , sink1, ,

bus); Verilog - modules

• Verilog models are made up of modules


• Modules are made of different types
of components:
– Parameters
– Nets
– Registers
– Primitives and Instances
– Continuous Assignments
– Procedural Blocks

– Task/Function definitions

Verilog - parameters

• Parameters are constants whose values aredetermined at compile-time


• They are defined with the parameterstatement:
parameter identifier = expression;
Example:
parameter width = 8, msb = 7, lsb = 0;

Verilog - nets

• Nets are the things that connect modelcomponents together – like signals in VHDL

• Nets are declared in statements like this:


net_type [range] [delay3] list_of_net_identifiers

• Example: wire
w1, w2; tri
[31:0] bus32;

Verilog – types of nets

Page 55
DEPT OF ECE GIST

• Each net type has functionality that is used tomodel different types of hardware such
as CMOS,NMOS, TTL etc

Verilog – net drivers


• Nets are driven by net drivers.
• Drivers may be:
– output port of a primitive instance
– output port of a module instance
– left-hand side of a continuous assignment
• There may be more than one driver on a net

• If there is more than one driver, the value of thenet is determined by a built-in
resolution function

Verilog - registers

• Registers are storage elements


• Values are stored in registers in proceduralassignment statements

• Registers can be used as the source for a primitiveor module instance (i.e. registers
can be connectedto input ports), but they cannot be driven in thesame way a net can
• Examples:
– reg r1, r2;

reg [31:0] bus32;


integer i;

• There are four types of registers:

– Reg:
• This is the generic register data type. A reg declaration canspecify registers which
are 1 bit wide to 1 million bits wide
– Integer

Page 56
DEPT OF ECE GIST

• Integers are 32 bit signed values


– Time
• Registers declared with the time keyword are 64-bit unsignedintegers
– Real (and Realtime)
• Real registers are 64-bit IEEE floating point

Verilog - memories

• Verilog allows arrays of registers, called memories


• Memories are static, single-dimension arrays
• The format of a memory declaration is:
– reg [range] identifier range ;
• Example:
• reg [0:31] temp, mem[1:1024];
• ...
• temp = mem[10]; --extract 10th element

• bit = temp[3]; --extarct 3rd bit

Verilog - primitives

• Primitives are pre-defined module types


• The Verilog primitives are sometimes called gates,because for the most part, they
are simple logical
primitives
• Examples:
– And, nand, or, nor, xor, xnor
– Buf, not
– Pullup, pulldown
– bufif0, notif0

• Examples:

module test;

wire n1, n2;

regain, bin;

Page 57
DEPT OF ECE GIST

andand_prim(n1, ain, bin);


notnot_prim(n2, n1);
endmodule

Procedural blocks

Verilog - Continuous assignments


• Continuous assignments are known as data flowstatements
• They describe how data moves from one place,either a net or register, to another
• They are usually thought of as representingcombinational logic
• Examples:
assign w1 = w2 & w3;

assign (strong1, pull0) mynet = enable;

Verilog - Procedural blocks

• Procedural blocks are the part of the languagewhich represents sequential behavior
• A module can have as many procedural blocks asnecessary
• These blocks are sequences of executablestatements

• The statements in each block are executedsequentially, but the blocks themselves
are concurrent and asynchronous to other blocks

• There are two types of procedural blocks, initialblocks and always blocks

• All initial and always blocks contain a singlestatement, which may be a compound
statement,
e.g.:
initial
begin statement1 ; statement2 ; ... end

Verilog - Initial blocks

• All initial blocks begin at time 0 and execute theinitial statement

Page 58
DEPT OF ECE GIST

• Because the statement may be a compoundstatement, this may entail executing lots
of
statements

• An initial block may cause activity to occurthroughout the entire simulation of the
model

• When the initial statement finishes execution, theinitial block terminates


• Examples:

initial x = 0; // a simple initialization


initial begin

x = 1; // an initialization
y = f(x);

#1 x = 0; // a value change 1 time unit


later y = f(x);
end

Verilog – Always block


• Always blocks also begin at time 0

• The only difference between an always block andan initial block is that when the
always statementfinishes execution, it starts executing again

Tasks and functions

Verilog – Tasks/functions

• Tasks and functions are declared within modules


• Tasks may only be used in procedural blocks

• A task invocation is a statement by itself. It maynot be used as an operand in an


expression
• Functions are used as operands in expressions

• A function may be used in either a proceduralblock or a continuous assignment, or


indeed, any
place where an expression may appear

Verilog – Tasks

Page 59
DEPT OF ECE GIST

• Tasks may have zero or more arguments, and theymay be input, output, or inout
arguments
• Time can elapse during the execution of a task,according to time and event controls
in the task
definition
• Exmaple:

taskdo_read;

begin

adbus_reg = addr; // put address


out end
endtask

Verilog – Functions
• In contrast to tasks, functions must execute in asingle instant of simulated time
• That is, not time or delay controls are allowed in afunction
• Function arguments are also restricted to inputsonly.

• Output and inout arguments are not allowed.


• The output of a function is indicated by anassignment to the function name
• Example:

function [15:0] relocate;


input [11:0] addr;

input [3:0]
relocation_factor; begin
relocate = addr + (relocation_factor<<12);

count = count + 1; // how many have we done end


endfunction

assignabsolute_address = relocate(relative_address,
rf);

VHDL/Verilog comparison

Capability

• VHDL – like Pascal or Ada programming languages

Page 60
DEPT OF ECE GIST

• Verilog – like C programming language


• It is important to remember that both are

Hardware Description Languages and notprogramming languages


• For synthesis only a subset of languages is used

• Hardware structure can be modeled equallyeffectively in both VHDL and Verilog.


• When modeling abstract hardware, the capabilityof VHDL can sometimes only be
achieved in
Verilog when using the PLI.

• The choice of which to use is not therefore basedsolely on technical capability but
on:
– personal preferences
– EDA tool availability
– commercial, business and marketing issues

• The modeling constructs of VHDL and Verilogcover a slightly different spectrum


across the
levels of behavioral abstraction

Compilation

• VHDL:

– Multiple design-units (entity/architecture pairs), thatreside in the same system file,


may be separatelycompiled if so desired.
– It is good design practice to keep each design unit in it'sown system file in which
case separate compilationshould not be an issue.
• Verilog:

Page 61
DEPT OF ECE GIST

– The Verilog language is still rooted in it'snativeinterpretative mode.

– Compilation is a means of speeding up simulation,buthas not changed the original


nature of the language.

– The care must be taken with both the compilationorder of code written in a single
file and the
compilation order of multiple files.

– Simulation results can change by simply changing theorder of compilation.


Data types
• VHDL:
– A multitude of language or user defined data types canbe used.

– This may mean dedicated conversion functions areneeded to convert objects from
one type to another.

– The choice of which data types to use should beconsidered wisely, especially
enumerated (abstract)data types.
– VHDL may be preferred because it allows a multitudeof language or user defined
data types to be used.

• Verilog:
– Compared to VHDL, Verilog data types a very simple,easy to use and very much
geared towards modelinghardware structure as opposed to abstract
hardwaremodeling.

– Unlike VHDL, all data types used in a Verilog model aredefined by the Verilog
language and not by the user.
– There are net data types, for example wire, and aregister data type called reg.

– A model with a signal whose type is one of the net datatypes has a corresponding
electrical wire in the impliedmodeled circuit.
– Verilog may be preferred because of it's simplicity.

Design reusability

• VHDL:
– Procedures and functions may be placed in a packageso that they are available to
any design-unit thatwishes to use them
• Verilog:
– There is no concept of packages in Verilog.
– Functions and procedures used within a model must bedefined in the module.

Page 62
DEPT OF ECE GIST

– To make functions and procedures generally accessiblefrom different module


statements the functions andprocedures must be placed in a separate system fileand
included using the `include compiler directive.
Ease of learning

• Starting with zero knowledge of either language,Verilog is probably the easiest to


grasp and
understand.
• VHDL may seem less intuitive at first for twoprimary reasons:

– First, it is very strongly typed; a feature that makes itrobust and powerful for the
advanced user after alonger learning phase.
– Second, there are many ways to model the samecircuit, specially those with large
hierarchical structuresHigh level constructs

• VHDL:

– There are more constructs and features for high-levelmodeling in VHDL than there
are in Verilog.

– Abstract data types can be used along with thefollowing statements:


• package statements for model reuse,
• configuration statements for configuring design structure,
• generate statements for replicating structure,

• generic statements for generic models that can beindividually characterized, for
example, bit width.
– All these language statements are useful insynthesizable models.
• Verilog:

– Except for being able to parameterize models byoverloading parameter constants,


there is no
equivalent to the high-level VHDL modeling statementsinVerilogLibraries
• VHDL:

– A library is a store for compiled entities, architectures,packages and configurations.


Useful for managingmultiple design projects.
• Verilog:

– There is no concept of a library in Verilog. This is due toit's origins as an


interpretive language.

Page 63
DEPT OF ECE GIST

Low level constructs


• VHDL:

– Simple two input logical operators are built into thelanguage, they are: NOT, AND,
OR, NAND, NOR, XORand XNOR.
– Any timing must be separately specified using the afterclause.
– Separate constructs defined under the VITAL languagemust be used to define the
cell primitives of ASIC andFPGA libraries.

• Verilog:

– The Verilog language was originally developed withgate level modeling in mind,
and so has very goodconstructs for modeling at this level and for modelingthe cell
primitives of ASIC and FPGA libraries.

– Examples include User Defined Primitives (UDP), truthtables and the specify block
for specifying timing delaysacross a module.
Managing large designs
• VHDL:

– Configuration, generate, generic and packagestatements all help manage large


design structures.

• Verilog:
– There are no statements in Verilog that help managelarge designs
Operators
• The majority of operators are the same betweenthe two languages.
• Verilog does have very useful unary reductionoperators that are not in VHDL.

• A loop statement can be used in VHDL to performthe same operation as a Verilog


unary reductionoperator.

• VHDL has the mod operator that is not found inVerilog. Procedures and tasks

• VHDL:
– concurrent procedure calls are allowed
• Verilog:
– concurrent procedure calls are not allowed

Readability

• This is more a matter of coding style andexperience than language feature.


• VHDL is a concise and verbose language;

Page 64
DEPT OF ECE GIST

• Verilog is more like C because it's constructs arebased approximately 50% on C and
50% on Ada.
• For this reason an existing C programmer mayprefer Verilog over VHDL.

• Whatever HDL is used, when writing or reading anHDL model to be synthesized it


is important tothink about hardwareiintent.

Structural replication
• VHDL:

– The generate statement replicates a number ofinstances of the same design-unit or


some sub part of adesign, and connects it appropriately.

• Verilog:

– There is no equivalent to the generate statement inVerilog.


Verboseness

• VHDL:

– Because VHDL is a very strongly typed language modelsmust be coded precisely


with defined and matchingdata types.
– This may be considered an advantage or disadvantage.

– It does mean models are often more verbose, and thecode often longer, than it's
Verilog equivalent.
• Verilog:
– Signals representing objects of different bits widthsmay be assigned to each other.

– The signal representing the smaller number of bits isautomatically padded out to
that of the larger numberof bits, and is independent of whether it is the assignedsignal
or not.
– Unused bits will be automatically optimized awayduring the synthesis process.

– This has the advantage of not needing to model quiteso explicitly as in VHDL, but
does mean unintendedmodeling errors will not be identified by an analyzer.

Examples

Binary up counter
• VHDL:

Page 65
DEPT OF ECE GIST

process (clock)
begin

if clock='1' and clock'event then


counter<= counter + 1;
end if;
end process;

• Verilog:

reg [upper:0] counter;

always @(posedge clock)

counter<= counter + 1; D

FilpFlop

• VHDL: process

(<clock>) begin

if<clock>'event and <clock>='1'


then

<output><=
<input>; end if;
end process;

• Verilog:

always @(posedge<clock>) begin


<reg><= <signal>;
End

Synchronous multiplier

• VHDL: process

(<clock>) begin

if<clock>='1' and <clock>'event


then

Page 66
DEPT OF ECE GIST

<output><= <input1> * <input2>;


end if;

end process;

• Verilog:
wire [17:0] <a_input>;

wire [17:0] <b_input>;

reg [35:0] <product>;

always @(posedge<clock>)

<product><= <a_input> *

<b_input>;

Page 67
DEPT OF ECE GIST

CHAPTER 5

XILINX

5.1 MIGRATING PROJECTS FROM PREVIOUS ISE SOFTWARE


RELEASES
When you open a project file from a previous release, the ISE® software
prompts you to migrate your project. If you click Backup and Migrate or Migrate
only, the software automatically converts your project file to the current release. If
you click Cancel, the software does not convert your project and, instead, opens
Project Navigator with no project loaded.

Note: After you convert your project, you cannot open it in previous versions of the
ISE software, such as the ISE 11 software. However, you can optionally create a
backup of the original project as part of project migration, as described below.

5.1.1 To Migrate A Project

i. In the ISE 12 Project Navigator, select File > Open Project.


ii. In the Open Project dialog box, select the. xise file to migrate.

Note You may need to change the extension in the Files of type field
todisplay .npl (ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE 10
software) project files.

iii. In the dialog box that appears, select Backup and Migrate or Migrate Only.
iv. The ISE software automatically converts your project to an ISE 12 project.

Note If you chose to Backup and Migrate, a backup of the original project is
created at project_name_ise12migration.zip.

v. Implement the design using the new version of the software.

Note Implementation status is not maintained after migration.

5.2 PROPERTIES

For information on properties that have changed in the ISE 12 software, see
ISE 11 to ISE 12 Properties Conversion.

5.2.1 IP Modules

If your design includes IP modules that were created using CORE Generator™
software or Xilinx® Platform Studio (XPS) and you need to modify these modules,
you may be required to update the core. However, if the core net list is present and

Page 68
DEPT OF ECE GIST

you do not need to modify the core, updates are not required and the existing net list
is used during implementation.

5.3 OBSOLETE SOURCE FILE TYPES


The ISE 12 software supports all of the source types that were supported in the
ISE 11 software.
If you are working with projects from previous releases, state diagram source
files (.dia), ABEL source files (.abl), and test bench waveform source files (.tbw) are
no longer supported. For state diagram and ABEL source files, the software finds an
associated HDL file and adds it to the project, if possible. For test bench waveform
files, the software automatically converts the TBW file to an HDL test bench and adds
it to the project. To convert a TBW file after project migration, see Converting aTBW
File to an HDL Test Bench.

5.3.1 Using Ise Example Projects

To help familiarize you with the ISE software and with FPGA and CPLD
designs, a set of example designs is provided with Project Navigator. The examples
show different design techniques and source types, such as VHDL, Verilog,
schematic, or EDIF, and include different constraints and IP.

To Open an Example

i. Select File > Open Example.


ii. In the Open Example dialog box, select the Sample Project Name.

Note To help you choose an example project, the Project Description field describes
each project. In addition, you can scroll to the right to see additional fields, which
provide details about the project.

i. In the Destination Directory field, enter a directory name or browse to the


directory.
ii. Click OK.

The example project is extracted to the directory you specified in the


Destination Directory field and is automatically opened in Project Navigator. You
can then run processes on the example project and save any changes.

Note If you modified an example project and want to overwrite it with the
original example project, select File > Open Example, select the Sample Project
Name, and specify the same Destination Directory you originally used. In the dialog
box that appears, select Overwrite the existing project and click OK.

Page 69
DEPT OF ECE GIST

5.3.2 Creating A Project:

Project Navigator allows you to manage your FPGA and CPLD designs using
an ISE® project, which contains all the source files and settings specific to your
design. First, you must create a project and then, add source files, and set process
properties. After you create a project, you can run processes to implement, constrain,
and analyze your design. Project Navigator provides a wizard to help you create a
project as follows.

Note If you prefer, you can create a project using the box instead of the New
Project Wizard. To use the New Project dialog box, deselect the
Use New Project wizard option in the ISE General page of the Preferences dialog
box.

To Create A Project

1. Select File > New Project to launch the New Project Wizard.
2. In the page, set the name, location, and project type, and click Next.
3. For EDIF or NGC/NGO projects only: In the Project page select the input and
constraint file for the project, and click next.
4. In the Project Settings page, set the device and project properties, and click
Next
5. In the Project Summary page, review the information, and click Finish to
create the project

Project Navigator creates the project file (project_name.xise) in the directory


you specified. After you add source files to the project, the files appear in the
Hierarchy Pane.
Project Navigator manages your project based on the design properties (top-
level module type, device type, synthesis tool, and language) you selected when you
created the project. It organizes all the parts of your design and keeps track of the
processes necessary to move the design from design entry through implementation to
programming the targeted Xilinx® device.

Note For information on changing design properties, see Changing Properties.

You can now perform any of the following:

o Create new source files for your project.

Page 70
DEPT OF ECE GIST

o Add existing source files to your project.

o Run processes on your source files.

Modify process properties.

Creating a Copy of a Project:

You can create a copy of a project to experiment with different source options
and implementations. Depending on your needs, the design source files for the copied
project and their location can vary as follows:

o Design source files are left in their existing location , and the copied project
points to these files.
o Design source files, including generated files, are copied and placed in a specified
directory.
o Design source files, excluding generated files, are copied and placed in a specified
directory.

Copied projects are the same as other projects in both form and function. For
example, you can do the following with copied projects:

∑ Open the copied project using the File > Open Project menu command.
∑ View, modify, and implement the copied project.
∑ Use the Project Browser to view key summary data for the copied project and then,
open the copied project for further analysis and implementation, as described in

5.4 USING THE POWER BROWSER

Alternatively, you can create an archive of your project, which puts all of the
project contents into a ZIP file. Archived projects must be unzipped before being
opened in Project Navigator. For information on archiving, see Creating a Project
Archive.

To Create A Copy Of A Project

1. Select File > Copy Project.

2. In the Copy Project dialog box, enter the Name for the copy.

Note The name for the copy can be the same as the name for the project, as long as
you specifies a different location.

3. Enter a directory Location to store the copied project.

Page 71
DEPT OF ECE GIST

4. Optionally, enter a Working directory.

By default, this is blank, and the working directory is the same as the project
directory. However, you can specify a working directory if you want to keep your
ISE® project file (.xise extension) separate from your working area.

5. Optionally, enter a Description for the copy.

The description can be useful in identifying key traits of the project for reference
later.

6. In the Source options area, do the following:

Select one of the following options:

∑ Keep sources in their current locations - to leave the design source files in their
existing location.

If you select this option, the copied project points to the files in their existing
location. If you edit the files in the copied project, the changes also appear in the
original project, because the source files are shared between the two projects.

∑ Copy sources to the new location - to make a copy of all the design source files
and place them in the specified Location directory.

If you select this option, the copied project points to the files in the specified
directory. If you edit the files in the copied project, the changes do not appear in the
original project, because the source files are not shared between the two projects.

Optionally, select Copy files from Macro Search Path directories to copy
files from the directories you specify in the Macro Search Path property in the
Translate dialog box. All files from the specified directories are copied, not just the
files used by the design.

Note: If you added a net list source file directly to the project as described in

Working with Net list-Based IP, the file is automatically copied as part of
Copy Project because it is a project source file. Adding net list source files to the
project is the preferred method for incorporating net list modules into your design,
because the files are managed automatically by Project Navigator.

Optionally, click Copy Additional Files to copy files that were not included
in the original project. In the Copy Additional Files dialog box, use the Add Files and
Remove Files buttons to update the list of additional files to copy. Additional files are
copied to the copied project location after all other files are copied .To exclude
generated files from the copy, such as implementation results and reports, select

Page 72
DEPT OF ECE GIST

Exclude Generated Files From The Copy When you select this option, the copied
project opens in a state in which processes have not yet been run.

7. To automatically open the copy after creating it, select Open the copied project.

Note By default, this option is disabled. If you leave this option disabled, the original
project remains open after the copy is made.

Click OK.

5.5 CREATING A PROJECT ARCHIVE

A project archive is a single, compressed ZIP file with a .zip extension. By


default, it contains all project files, source files, and generated files, including the
following:

∑ User-added sources and associated files


∑ Remote sources
∑ Verilog `include files
∑ Files in the macro search path
∑ Generated files
∑ Non-project files

To Archive A Project:

1. Select Project > Archive.

2. In the Project Archive dialog box, specify a file name and directory for the
ZIP file.

3. Optionally, select Exclude generated files from the archive to exclude


generated files and non-project files from the archive.

4. Click OK.

A ZIP file is created in the specified directory. To open the archived project,
you must first unzip the ZIP file, and then, you can open the project.

Note Sources that reside outside of the project directory are copied into a remote
sources subdirectory in the project archive.

Page 73
DEPT OF ECE GIST

CHAPTER 6

SIMULATION RESULTS
RTL SCHEMATIC

INTERNAL STRUCTURE OF RTL SCHEMATIC

Page 74
DEPT OF ECE GIST

TECHNOLOGY SCHEMATIC

INTERNAL STRUCTURE OF TECHNOLOGY SCHEMATIC

Page 75
DEPT OF ECE GIST

DESIGN SUMMARY

SYNTHESIS REPORT

--> Parameter xsthdpdir set to xst

Total REAL time to Xst completion: 0.00 secs

Total CPU time to Xst completion: 0.34 secs

--> Reading design: proposedcodewords.prj

TABLE OF CONTENTS

1) Synthesis Options Summary

2) HDL Compilation

Page 76
DEPT OF ECE GIST

3) Design Hierarchy Analysis

4) HDL Analysis

5) HDL Synthesis

5.1) HDL Synthesis Report

6) Advanced HDL Synthesis

6.1) Advanced HDL Synthesis Report

7) Low Level Synthesis

8) Partition Report

9) Final Report

9.1) Device utilization summary

9.2) Partition Resource Summary

9.3) TIMING REPORT

=========================================================================

* Synthesis Options Summary *

=========================================================================

---- Source Parameters


:
Input File Name "proposedcodewords.prj"

Input Format : mixed


Ignore Synthesis Constraint
File : NO

---- Target Parameters

Output File Name : "proposedcodewords"

Output Format : NGC

Target Device : xc3s100e-5-vq100

---- Source Options

Top Module Name : proposedcodewords

Automatic FSM Extraction : YES


FSM Encoding
Algorithm : Auto

Safe Implementation : No
:
FSM Style LUT

RAM Extraction : Yes

Page 77
DEPT OF ECE GIST

RAM Style : Auto

ROM Extraction : Yes

Mux Style : Auto

Decoder Extraction : YES

Priority Encoder Extraction : Yes


Shift Register
Extraction : YES
Logical Shifter
Extraction : YES

XOR Collapsing : YES

ROM Style : Auto

Mux Extraction : Yes

Resource Sharing : YES


Asynchronous To
Synchronous : NO

Multiplier Style : Auto


Automatic Register
Balancing : No

---- Target Options

Add IO Buffers : YES


Global Maximum
Fanout : 500

Add Generic Clock Buffer(BUFG) : 24

Register Duplication : YES

Slice Packing : YES


Optimize Instantiated
Primitives : NO

Use Clock Enable : Yes

Use Synchronous Set : Yes


Use Synchronous
Reset : Yes

Pack IO Registers into IOBs : Auto

Equivalent register Removal : YES

---- General Options

Optimization Goal : Speed

Optimization Effort :1

Keep Hierarchy : No

Netlist Hierarchy : As_Optimized

RTL Output : Yes

Page 78
DEPT OF ECE GIST

Global Optimization : AllClockNets

Read Cores : YES


Write Timing
Constraints : NO
Cross Clock
Analysis : NO

Hierarchy Separator :/

Bus Delimiter : <>

Case Specifier : Maintain


Slice Utilization
Ratio : 100
BRAM Utilization
Ratio : 100

Verilog 2001 : YES


Auto BRAM
Packing : NO

Slice Utilization Ratio


Delta :5

=======================================================================

=========================================================================

* HDL Compilation *

=========================================================================

Compiling verilog file "xor_gate.v" in library work

Compiling verilog file "or_gate.v" in library work

Module <xor_gate> compiled

Compiling verilog file "half_adder.v" in library work

Module <or_gate> compiled

Compiling verilog file "decision.v" in library work

Module <half_adder> compiled

Compiling verilog file "proposedcodewords.v" in library work

Module <decision> compiled

Module <proposedcodewords> compiled

No errors in compilation

Analysis of file <"proposedcodewords.prj"> succeeded.

Page 79
DEPT OF ECE GIST
Design Hierarchy
* Analysis *

=========================================================================

Analyzing hierarchy for module <proposedcodewords> in library <work>.

Analyzing hierarchy for module <xor_gate> in library <work>.

Analyzing hierarchy for module <half_adder> in library <work>.

Analyzing hierarchy for module <or_gate> in library <work>.

Analyzing hierarchy for module <decision> in library <work>.


=========================================================================

* HDL Analysis *

=========================================================================

Analyzing top module <proposedcodewords>.

Module <proposedcodewords> is correct for synthesis.

Analyzing module <xor_gate> in library <work>.

Module <xor_gate> is correct for synthesis.

Analyzing module <half_adder> in library <work>.

Module <half_adder> is correct for synthesis.

Analyzing module <or_gate> in library <work>.

Module <or_gate> is correct for synthesis.

Analyzing module <decision> in library <work>.

Module <decision> is correct for synthesis.

Page 80
DEPT OF ECE GIST

* HDL Synthesis *
=========================================================================

Performing bidirectional port resolution...

Synthesizing Unit <xor_gate>.

Related source file is "xor_gate.v".

Found 1-bit xor2 for signal <y>.

Unit <xor_gate> synthesized.

Synthesizing Unit <half_adder>.

Related source file is "half_adder.v".

Found 1-bit xor2 for signal <sum>.

Unit <half_adder> synthesized.

Synthesizing Unit <or_gate>.

Related source file is "or_gate.v".

Unit <or_gate> synthesized.

Synthesizing Unit <decision>.

Related source file is "decision.v".

WARNING:Xst:737 - Found 1-bit latch for signal <miss>. Latches may be generated from incomplete case or if
statements. We do not recommend the use of latches in FPGA/CPLD designs, as they may lead to timing
problems.

INFO:Xst:2371 - HDL ADVISOR - Logic functions respectively driving the data and gate enable inputs of this
latch share common terms. This situation will potentially lead to setup/hold violations and, as a result, to
simulation problems. This situation may come from an incomplete case statement (all selector values are not
covered). You should carefully review if it was in your intentions to describe such a latch.

WARNING:Xst:737 - Found 1-bit latch for signal <match>. Latches may be generated from incomplete case or
if statements. We do not recommend the use of latches in FPGA/CPLD designs, as they may lead to timing
problems.

INFO:Xst:2371 - HDL ADVISOR - Logic functions respectively driving the data and gate enable inputs of this
latch share common terms. This situation will potentially lead to setup/hold violations and, as a result, to
simulation problems. This situation may come from an incomplete case statement (all selector values are not
covered). You should carefully review if it was in your intentions to describe such a latch.

WARNING:Xst:737 - Found 1-bit latch for signal <fault>. Latches may be generated from incomplete case or if
statements. We do not recommend the use of latches in FPGA/CPLD designs, as they may lead to timing
problems.

INFO:Xst:2371 - HDL ADVISOR - Logic functions respectively driving the data and gate enable inputs of this
latch share common terms. This situation will potentially lead to setup/hold violations and, as a result, to
simulation problems. This situation may come from an

Page 81
DEPT OF ECE GIST

incomplete case statement (all selector values are not covered). You should carefully review
if it was in your intentions to describe such a latch.

Unit <decision> synthesized.

Synthesizing Unit <proposedcodewords>.

Related source file is "proposedcodewords.v".

Unit <proposedcodewords> synthesized.

==================================================================
=======

HDL Synthesis Report

Macro Statistics
# Latches :3
1-bit latch :3
# Xors : 20
1-bit xor2 : 20

==================================================================
=======
==================================================================
=======

* Advanced HDL Synthesis *

==================================================================
=======
==================================================================
=======

Advanced HDL Synthesis Report

Macro Statistics
# Latches :3
1-bit latch :3
# Xors : 20
1-bit xor2 : 20

Page 82
DEPT OF ECE GIST

===========================================================
==============
===========================================================
==============

* Low Level Synthesis *


===========================================================
==============

Optimizing unit <proposedcodewords> ...


Optimizing unit <decision> ...
Mapping all equations...

Building and optimizing final netlist ...

Found area constraint ratio of 100 (+ 5) on block proposedcodewords, actual ratio is


1.

Final Macro Processing ...


===========================================================
==============

Final Register Report

Found no macro

===========================================================
==============
===========================================================
==============

* Partition Report *
===========================================================
==============

Partition Implementation Status

-------------------------------
No Partitions were found in this design.

Page 83
DEPT OF ECE GIST

=========================================================================

* Final Report *

=========================================================================

Final Results
:
RTL Top Level Output File proposedcodewords.ng
Name r
:
Top Level Output File proposedcodewor
Name ds

Output Format : NGC


Optimization
Goal : Speed

Keep Hierarchy : No

Design Statistics

# IOs : 16

Cell Usage :

# BELS : 24

# GND :1

# LUT2 :6

# LUT3 :3

# LUT4 : 14
#
FlipFlops/Latche
s :3

# LD :3

# IO Buffers : 16

# IBUF : 13

# OBUF :3

=========================================================================

Device utilization summary:

---------------------------

Selected Device : 3s100evq100-5

Number of Slices: 13 out of 960 1%

Number of 4 input LUTs: 23 out of 1920 1%

Number of IOs: 16

Page 84
DEPT OF ECE GIST

Number of bonded IOBs: 16 out of 66 24%

IOB Flip Flops: 3

---------------------------

Partition Resource Summary:

---------------------------

No Partitions were found in this design.

--------------------------

========================================================================
=

TIMING REPORT

NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.

FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

GENERATED AFTER PLACE-and-ROUTE.

Clock Information:
------------------
--------------------------- +------------------ -------
---------- ------+ +
| Clock buffer(FF |
Clock Signal name) Load |
--------------------------- +------------------+----
---------- --------- +
u23/miss_not0001(u23/miss_not00011:O)|
NONE(*)(u23/miss) |3|
--------------------------- +------------------+----
---------- --------- +
(*) This 1 clock signal(s) are generated by
combinatorial logic,
and XST is not able to identify which are the
primary clock signals.

Please use the CLOCK_SIGNAL constraint to specify the clock signal(s) generated by combinatorial
logic.

INFO:Xst:2169 - HDL ADVISOR - Some clock signals were not automatically buffered by XST with
BUFG/BUFR resources. Please use the buffer_type constraint in order to insert these buffers to the
clock signals to help prevent skew problems.

Asynchronous Control Signals Information:

----------------------------------------

Page 85
DEPT OF ECE GIST

No asynchronous control signals found in this design

Timing Summary:

---------------

Speed Grade: -5

Minimum period: No path found

Minimum input arrival time before clock: 7.201ns

Maximum output required time after clock: 4.114ns

Maximum combinational path delay: No path found

Timing Detail:

--------------
All values displayed in nanoseconds (ns)

=========================================================================

Timing constraint: Default OFFSET IN BEFORE for Clock 'u23/miss_not0001'

Total number of paths / destination ports: 299 / 3

-------------------------------------------------------------------------

Offset: 7.201ns (Levels of Logic = 6)


k<2>
Source: (PAD)
Destinatio u23/miss
n: (LATCH)

Destination Clock: u23/miss_not0001 falling

Data Path: k<2> to


u23/miss

Gate Net
Cell:in-
>out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------
IBUF:I-
>O 2 1.106 0.532 k_2_IBUF (k_2_IBUF)
LUT2:I0-
>O 4 0.612 0.651 u3/Mxor_y_Result1 (w3)
LUT4:I0-
>O 2 0.612 0.532 u18/Mxor_sum_Result1 (w25)
LUT3:I0-
>O 2 0.612 0.449 u21/carry1 (t4)
LUT4:I1-
>O 3 0.612 0.603 u23/match_mux000011 (u23/N2)
LUT3:I0- u23/match_mux00002
>O 1 0.612 0.000 (u23/match_mux0000)

Page 86
DEPT OF ECE GIST

LD:D 0.268 u23/match


----------------------------------------
Total 7.201ns (4.434ns logic, 2.767ns route)
(61.6% logic, 38.4% route)

=========================================================================

Timing constraint: Default OFFSET OUT AFTER for Clock 'u23/miss_not0001'

Total number of paths / destination ports: 3 / 3

-------------------------------------------------------------------------

Offset: 4.114ns (Levels of Logic = 1)

Source: u23/fault (LATCH)

Destination: fault (PAD)

Source Clock: u23/miss_not0001 falling

Data Path: u23/fault to fault

Gate Net

Cell:in->out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------

LD:G->Q 1 0.588 0.357 u23/fault (u23/fault)

OBUF:I->O 3.169 fault_OBUF (fault)

----------------------------------------

Total 4.114ns (3.757ns logic, 0.357ns route)

(91.3% logic, 8.7% route)

=========================================================================

Total REAL time to Xst completion: 11.00 secs

Total CPU time to Xst completion: 11.28 secs

-->

Total memory usage is 191416 kilobytes

Page 87
DEPT OF ECE GIST

Number of 0
errors : 0 ( filtered)
Number of
warnings : 3 ( 0 filtered)
Number of 0
infos : 4 ( filtered)

SIMULATION RESULTS

Page 88
DEPT OF ECE GIST

FUTURE SCOPE
In addition, an efficient processing architecture has been presented to further
minimize the latency and complexity. the proposed architecture is effective in
reducing the latency as well as the complexity considerably, it can be regarded as a
promising solution for the comparison of ECC-protected data. The scope of this
project we formulate the DMC technique to assure the consistency in memory to
reduce latency and complexity.

CONCLUSION

In this process, we formulate the DMC technique to assure the consistency in


memory. The proposed protection code utilizes decimal algorithm to detect errors, so
that more errors were detected and corrected. To reduce the hardware complexity and
latency, a new architecture has been presented for matching the data protected with an
ECC. To reduces the latency; the comparison of the data is parallelized with the
encoding process that generates the parity information. The parallel operations are
enabled based on the fact that the systematic codeword has separate fields for the data
and parity. In addition, an efficient processing architecture has been presented to
further minimize the latency and complexity. Consequently a sensible reduction in
power is accomplished with the proposed devise.

Page 89
DEPT OF ECE GIST

REFERENCES
[1] J.D. Warnock, Y.H. Chan, S. M.Carey, H.Wen, P. J. Meaney, G.Gerwig,
H.H.Smith, Y.H.Chan, J. Davis, P. Bunce, A.Pelella, D.Rodko, P.Patel, T.Strach,
D.Malone, F. Malgioglio, J. Neves, D. L. Rude, and W. V. Huott ―Circuit and
physical design implementation of the microprocessor chip for the zEnterprise
system,‖ IEEE J. Solid-State Circuits, vol. 47, no. 1, pp. 151– 163, Jan. 2012.

[2] B.Y Kong, Jihyuck Jo, HyewonJeong, Mina Hwang, Soyoung Cha, Bongjin
Kim, and In-Cheol Park, ―LowComplexity Low-Latency Architecture for Matching of
Data Encoded With Hard Systematic Error-Correcting Codes,‖ IEEE Trans. Very
Large Scale Integr.(VLSI) Syst., vol. 22, no. 7, pp. 1648 - 1652, July. 2014.

[3] H. Ando, Y. Yoshida, A. Inoue, I. Sugiyama, T. Asakawa, T. Muta, K.Morita,


T. Motokurumada, S. Okada, H. Yamashita, Y. Satsukawa, A. Konmoto, R.
Yamashita, and H. Sugiyama, ―A 1.3 GHz fifth generation SPARC64
microprocessor,‖ in ISSCC. Dig. Tech. Papers, 2003, pp. 246–247.

[4] AMD Inc., Sunnyvale, CA, ―Family 10h AMD Opteron™ Processor Product
Data Sheet,‖ PID: 40036 Rev: 3.04, 2010.
Available:http://support.amd.com/us/Processor_TechDocs/40036.pdf [Online]

[5] W.Wu, D. Somasekhar, and S.-L. Lu, ―Direct compare of information coded
with error-correcting codes,‖ IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol.
20, no. 11, pp. 2147–2151, Nov. 2012.

[6] D. Radaelli, H. Puchner, S. Wong, and S. Daniel, ―Investigation of multi-bit


upsets in a 150 nm technology SRAM device,‖ IEEE Trans. Nucl. Sci., vol. 52, no. 6,
pp. 2433–2437, Dec. 2005.

[7] E. Ibe, H. Taniguchi, Y. Yahagi, K. Shimbo, and T. Toba, ―Impact of scaling


on induced soft error in SRAMs from an 250 nm to a 22 nm design rule,‖ IEEE Trans.
Electron Devices, vol. 57, no. 7, pp. 1527– 1538, Jul. 2010.

Page 90
DEPT OF ECE GIST

APPENDIX A
SOURCE CODE
moduletb_bwa_g;

// Inputs
reg [3:0] a;
reg [3:0] b;

// Outputs wire
[7:0] out;

// Instantiate the Unit Under Test


(UUT) bwa_guut (
.a(a),
.b(b),
.out(out)
);

initial begin
// Initialize Inputs a =
4'hf;
b = 4'hf;
#100;
a = 4'h8;
b = 4'ha;

end
endmodule

moduletb_bwa_n;

// Inputs
reg [3:0] a;
reg [3:0] b;

// Outputs wire
[2:0] out;

// Instantiate the Unit Under Test


(UUT) bwa_nuut (
.a(a),
.b(b),
.out(out)
);

initial begin
// Initialize Inputs a =
4'hf;
b = 4'hf;
#100;
a = 4'h8;
b = 4'ha;

Page 91
DEPT OF ECE GIST

end

endmodule

module tb_code8_4;

// Inputs
reg [3:0] a;
reg [3:0] b;

// Outputs wire
[5:0] out;

// Instantiate the Unit Under Test


(UUT) code8_4 uut (
.a(a),
.b(b),
.out(out)
);

initial begin

a = 4'hf; b =
4'hf; #100;
a = 4'h8;
b = 4'ha;

end

endmodule

moduletb_decicion;

// Inputs
reg a;
reg b;
reg c;
reg d;
reg e;
reg f;
reg en;

// Outputs
wire miss;
wire fault;
wire match;

// Instantiate the Unit Under Test


(UUT) decision uut (
.a(a),
.b(b),
.c(c),
.d(d),
.e(e),
.f(f),

Page 92
DEPT OF ECE GIST

.miss(miss),
.fault(fault),
.match(match),
.en(en)
);
integer i;
initial begin
en = 1;
#20;
en=0;
end
initial begin
{a,b,c}=3'b000;
#350;
{a,b,c}=3'b111;
end
initial begin
for(i=0;i<16;i=i+1)
begin
#20;
{d,e,f}=i;
#20;
end
end

endmodule

moduletb_proposedcodewords;

// Inputs
reg [7:0] n;
reg [3:0] k;
reg en;

// Outputs
wiremiss,match,fault;

// Instantiate the Unit Under Test (UUT)


proposedcodewordsuut (
.n(n),
.k(k),
.miss(miss),
.match(match),
.fault(fault),
.en(en)
);

initial begin
en=1'b1; #20;
en=1'b0;
end

initial
begin
n=8'hff;
k=4'hf;

Page 93
DEPT OF ECE GIST

#100;
n=8'hd9;
k=4'h7;
#100;
n=8'h00;
k=4'h5;

end

endmodule

moduletb_sa_arch;

// Inputs
reg [7:0] a;
reg [7:0] b;

// Outputs
wire sum;

// Instantiate the Unit Under Test


(UUT) sa_archuut (
.a(a),
.b(b),
.sum(sum)
);

initial begin
// Initialize Inputs a =
8'h12;
b = 8'h34; #100;
a = 8'hab; b =
8'hcd; #100;
a = 8'hff; b =
8'hff;

end

endmodule

modulexor_gate(a,
b,y); input a,b;
output y;

Page 94
DEPT OF ECE GIST

assign y=a^b;

endmodule

modulehalf_adder(a,b,sum,carry);
input a,b;
outputsum,carry;
assign sum=a^b;
assign carry=a&b;
endmodule

module decision(a,b,c,d,e,f,miss,fault,match,en);
input a,b,c,d,e,f;
outputregmiss,match,fault;
input en;

always@(*)
begin
if(en)
begin
miss=0;
match=0;
fault=0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b000) begin
match=1'b1;
miss=1'b0;
fault=1'b0; end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b001) begin
match=1'b1;
miss=1'b0;
fault=1'b0; end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b010) begin
match=1'b0;
miss=1'b0;
fault=1'b1; end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b011) begin
match=1'b0;
miss=1'b0;
fault=1'b1; end

Page 95
DEPT OF ECE GIST

else if({a,b,c}==3'b0 &


{d,e,f}==3'b100) begin
match=1'b0;
miss=1'b0;
fault=1'b1;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b101) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b110) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b111) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
else if({a,b,c}==3'b1 &
{d,e,f}=={0,1,2,3,4,5,6,7}) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
end

endmodule

moduleor_gate(a,b
,y); input a,b;
output y;
assign
y=a||b;
endmodule

modulesa(a,b,s
um); input
a,b;
output
sum;
assign
sum=a^b;

endmodule

modulebwa_g(a,b,o
ut); input
[3:0]a,b; output
[7:0]out;

Page 96
DEPT OF ECE GIST

wire w1,w2,w3,w4,w5,w6,w7,w8,w9,
w10,w11,w12,w13,w14,w15,w16;
half_adder u1(.a(a[0]),.b(b[0]),.sum(w1),.carry(w2));
half_adder u2(.a(a[1]),.b(b[1]),.sum(w3),.carry(w4));
half_adder u3(.a(a[2]),.b(b[2]),.sum(w5),.carry(w6));
half_adder u4(.a(a[3]),.b(b[3]),.sum(w7),.carry(w8));
half_adder u5(.a(w1),.b(w3),.sum(w9),.carry(w10));
half_adder u6(.a(w5),.b(w7),.sum(w11),.carry(w12));
half_adder u7(.a(w2),.b(w4),.sum(w13),.carry(w14));
half_adder u8(.a(w6),.b(w8),.sum(w15),.carry(w16));
half_adder u9(.a(w9),.b(w11),.sum(out[0]),.carry(out[1]));
half_adder u10(.a(w13),.b(w15),.sum(out[2]),.carry(out[3]));
half_adder u11(.a(w10),.b(w12),.sum(out[4]),.carry(out[5]));
half_adder u12(.a(w14),.b(w16),.sum(out[6]),.carry(out[7]));

endmodule
module code8_4(a,b,out);
input [3:0]a,b;
output [5:0]out;
wire w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,
w12,w13,w14,w15,w16,w17,w18,w19,w20;

half_adder u1(.a(a[0]),.b(b[0]),.sum(w1),.carry(w2));
half_adder u2(.a(a[1]),.b(b[1]),.sum(w3),.carry(w4));
half_adder u3(.a(a[2]),.b(b[2]),.sum(w5),.carry(w6));
half_adder u4(.a(a[3]),.b(b[3]),.sum(w7),.carry(w8));

half_adder u5(.a(w1),.b(w3),.sum(w9),.carry(w10));
half_adder u6(.a(w5),.b(w7),.sum(w11),.carry(w12));
half_adder u7(.a(w2),.b(w4),.sum(w13),.carry(w14));
half_adder u8(.a(w6),.b(w8),.sum(w15),.carry(w16));

half_adder u9(.a(w9),.b(w13),.sum(out[0]),.carry(out[1]));
half_adder u10(.a(w10),.b(w11),.sum(w17),.carry(w18));
half_adder u11(.a(w14),.b(w15),.sum(w19),.carry(w20));
or_gate u12(.a(w12),.b(w16),.y(out[5]));

half_adder u13(.a(w17),.b(w19),.sum(out[2]),.carry(out[3]));
or_gate u14(.a(w18),.b(w20),.y(out[4]));

endmodule
module decision(a,b,c,d,e,f,miss,fault,match,en);
input a,b,c,d,e,f;
outputregmiss,match,fault;
input en;

always@(*)
begin
if(en)
begin
miss=0;
match=0;
fault=0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b000) begin

Page 97
DEPT OF ECE GIST

match=1'b1;
miss=1'b0;
fault=1'b0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b001) begin
match=1'b1;
miss=1'b0;
fault=1'b0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b010) begin
match=1'b0;
miss=1'b0;
fault=1'b1;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b011) begin
match=1'b0;
miss=1'b0;
fault=1'b1;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b100) begin
match=1'b0;
miss=1'b0;
fault=1'b1;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b101) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b110) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b111) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
else if({a,b,c}==3'b1 &
{d,e,f}=={0,1,2,3,4,5,6,7}) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
end

endmodule
input
[3:0]a,b;
output [7:0]out;

Page 98
DEPT OF ECE GIST

wire w1,w2,w3,w4,w5,w6,w7,w8,w9,
w10,w11,w12,w13,w14,w15,w16;
half_adder u1(.a(a[0]),.b(b[0]),.sum(w1),.carry(w2));
half_adder u2(.a(a[1]),.b(b[1]),.sum(w3),.carry(w4));
half_adder u3(.a(a[2]),.b(b[2]),.sum(w5),.carry(w6));
half_adder u4(.a(a[3]),.b(b[3]),.sum(w7),.carry(w8));
half_adder u5(.a(w1),.b(w3),.sum(w9),.carry(w10));
half_adder u6(.a(w5),.b(w7),.sum(w11),.carry(w12));
half_adder u7(.a(w2),.b(w4),.sum(w13),.carry(w14));
half_adder u8(.a(w6),.b(w8),.sum(w15),.carry(w16));
half_adder u9(.a(w9),.b(w11),.sum(out[0]),.carry(out[1]));
half_adder u10(.a(w13),.b(w15),.sum(out[2]),.carry(out[3]));
half_adder u11(.a(w10),.b(w12),.sum(out[4]),.carry(out[5]));
half_adder u12(.a(w14),.b(w16),.sum(out[6]),.carry(out[7]));

endmodule

module bwa_n(a,b,out);
input [3:0]a,b; output
[2:0]out;

wire w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w12,w13,w14,w15,w16;

half_adder u1(.a(a[0]),.b(b[0]),.sum(w1),.carry(w2));
half_adder u2(.a(a[1]),.b(b[1]),.sum(w3),.carry(w4));
half_adder u3(.a(a[2]),.b(b[2]),.sum(w5),.carry(w6));
half_adder u4(.a(a[3]),.b(b[3]),.sum(w7),.carry(w8));

half_adder u5(.a(w1),.b(w3),.sum(w9),.carry(w10));
half_adder u6(.a(w5),.b(w7),.sum(w11),.carry(w12));
or_gate u7(.a(w2),.b(w4),.y(w13));
or_gateu8(.a(w6),.b(w8),.y(w14));

half_adder u9(.a(w9),.b(w11),.sum(out[0]),.carry(out[1]));
or_gate u10(.a(w10),.b(w12),.y(w15));
or_gate u11(.a(w13),.b(w14),.y(w16));
or_gate u12(.a(w15),.b(w16),.y(out[2]));

endmodule

moduleor_gate(a,b,y);
input a,b;
output y;
assign y=a||b;
endmodule

moduleproposedcodewords(n,k,miss,match,fault,en);
input [7:0]n;
input en;
input [3:0]k;
outputmiss,match,fault;
wire w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,
w11,w12,w13,w14,w15,w16,w17,w18,
w19,w20,w21,w22,w23,w24,w25,w26,w27,w28;
wire t1,t2,t3,t4,t5,t6;

Page 99
DEPT OF ECE GIST

xor_gate
u1(.a(n[0]),.b(k[0]),.y(w1));
xor_gate
u2(.a(n[1]),.b(k[1]),.y(w2));
xor_gate
u3(.a(n[2]),.b(k[2]),.y(w3));
xor_gate
u4(.a(n[3]),.b(k[3]),.y(w4));
xor_gate
u5(.a(n[4]),.b(k[0]),.y(w5));
xor_gate
u6(.a(n[5]),.b(k[1]),.y(w6));
xor_gate
u7(.a(n[6]),.b(k[2]),.y(w7));
xor_gate
u8(.a(n[7]),.b(k[3]),.y(w8));

half_adder
u9(.a(w1),.b(w2),.sum(w9),.carry(w10));
half_adder
u10(.a(w3),.b(w4),.sum(w11),.carry(w12));
half_adder
u11(.a(w5),.b(w6),.sum(w13),.carry(w14));
half_adder
u12(.a(w7),.b(w8),.sum(w15),.carry(w16));

half_adder
u13(.a(w9),.b(w11),.sum(w17),.carry(w18));
half_adder
u14(.a(w10),.b(w12),.sum(w19),.carry(w20));
half_adder
u15(.a(w13),.b(w15),.sum(w21),.carry(w22));
half_adder
u16(.a(w14),.b(w16),.sum(w23),.carry(w24));

half_adder
u17(.a(w17),.b(w21),.sum(t1),.carry(t2));
half_adder
u18(.a(w18),.b(w19),.sum(w25),.carry(w26));
half_adder
u19(.a(w22),.b(w23),.sum(w27),.carry(w28));
or_gate u206(.a(w20),.b(w24),.y(t6));

half_adder
u21(.a(w25),.b(w27),.sum(t3),.carry(t4));
or_gate u22(.a(w26),.b(w28),.y(t5));

decision u23(.a(t6),.b(t5),.c(t4),.d(t3),.e(t2),.f(t1),.en(en),
.miss(miss),.match(match),.fault(fault));
endmodule

modulesa_arch(a,b,su
m); input [7:0]a,b;
output sum;

Page 100
wire
w1,w2,w3,w4,w5,w
6,w7,w8,w9,
w10,w11,w12,w13,
w14;
xor_gate
u1(.a(a[0]),.b(b[0]),.y(w1));
xor_gate
u2(.a(a[1]),.b(b[1]),.y(w2));
xor_gate
u3(.a(a[2]),.b(b[2]),.y(w3));
xor_gate
u4(.a(a[3]),.b(b[3]),.y(w4));
xor_gate
u5(.a(a[4]),.b(b[4]),.y(w5));
xor_gate
u6(.a(a[5]),.b(b[5]),.y(w6));
xor_gate
u7(.a(a[6]),.b(b[6]),.y(w7));
xor_gate
u8(.a(a[7]),.b(b[7]),.y(w8));

ha
u9(.a(w1),.b(w2),.sum(w9));
ha
u10(.a(w3),.b(w4),.sum(w10))
; ha
u11(.a(w5),.b(w6),.sum(w11))
; ha
u12(.a(w7),.b(w8),.sum(w12))
;

sa
u13(.a(w9),.b(w10),.sum(w13));
sa
u14(.a(w11),.b(w12),.sum(w14))
; sa
u15(.a(w13),.b(w14),.sum(sum))
;

endmodule

Page 101
APPENDIX B
IEEE BASE PAPER

You might also like