Professional Documents
Culture Documents
CHAPTER 1
INTRODUCTION
1.1 DECODE AND COMPARE ARCHITECTURE
Let us consider a cache memory where a k-bit tag is stored in the form of an n-bit
codeword after being encoded by a (n, k) code. In the decode-and-compare architecture depicted
in Fig.1.1 (a), the n-bit retrieved codeword should first be decoded to extract the original k-bit
tag. The extracted k-bit tag is then compared with the k-bit tag field of an incoming address to
determine whether the tags are matched or not. As the retrieved codeword should go through the
decoder before being compared with the incoming tag, the critical path is too long to be
employed in a practical cache system designed for high- speed access. Since the decoder is one
of the most complicated processing elements, in addition, the complexity overhead is not
negligible.
Note that decoding is usually more complex and takes more time than encoding as it
encompasses a series of error detection or syndrome calculation, and error correction. The
implementation results support the claim. To resolve the drawbacks of the decode-and-
compare architecture, therefore, the decoding of a retrieved codeword is replaced with the
Page 1
DEPT OF ECE GIST
encoding of an incoming tag in the encode-
Page 1
DEPT OF ECE GIST
and-compare architecture More precisely, a k-bit incoming tag is first encoded to the
corresponding n-bit codeword X and compared with an n-bit retrieved codeword Y as shown
in Fig1.1(b). The comparison is to examine how many bits the two code words differ, not to
check if the two code words are exactly equal to each other. For this, we compute the
Hamming distance d between the two code words and classify the cases according to the
range of d. Let tmax and rmax denote the numbers of maximally correctable and detectable
errors, respectively. The cases are summarized as follows.
i. If d = 0, X matches Y exactly.
ii. If 0 < d ≤ tmax, X will match Y provided at most tmax errors in Y are corrected.
iii. If tmax< d ≤ rmax, Y has detectable but uncorrectable errors. In this case, the cache
may issue a system fault so as to make the central processing unit take a proper action.
Assuming that the incoming address has no errors, we can regard the two tags
as matched if d is in either the first or the second ranges. In this way, while
maintaining the error-correcting capability, the architecture can remove the decoder
from its critical path at the cost of an encoder being newly introduced. Note that the
encoder is, in general, much simpler than the decoder, and thus the encoding cost is
significantly less than the decoding cost. Since the above method needs to compute
the Hamming distance, presented a circuit dedicated for the computation. The circuit
shown in Fig. 2 first performs XOR operations for every pair of bits in X and Y so as
to generate a vector representing the bitwise difference of the two code words. The
following half adders (HAs) are used to count the number of 1‘s in two adjacent bits
in the vector. The numbers of 1‘s are accumulated by passing through the following
SA tree. In the SA tree, the accumulated value z is saturated to rmax + 1 if it exceeds
rmax. More precisely, given inputs x and y, z can be expressed as follows:
Page 2
DEPT OF ECE GIST
The final accumulated value indicates the range of d. As the compulsory saturation
necessitates additional logic circuitry, the complexity of a SA is higher than the
conventional adder.
1.2 DECODER
Let us suppose that a logic network has 2 inputs A and B. They will give rise
to 4 states A, A‘, B, B‘ . The truth table for this decoder is shown below:
Page 3
DEPT OF ECE GIST
For any input combination only one of the outputs is low and all others are
high. The low value at the output represents the state of the input.
Combine two or more small decoders with enable inputs to form a larger
decoder e.g. 3-to-8-line decoder constructed from two 2-to-4-line decoders. Decoder
with enable input can function as demultiplexer.
Page 4
DEPT OF ECE GIST
1.3 ENCODER
Octal-to-Binary take 8 inputs and provides 3 outputs, thus doing the opposite
of what the 3-to-8 decoder does. At any one time, only one input line has a value of 1.
The figure below shows the truth table of an Octal-to-binary encoder.
Page 5
DEPT OF ECE GIST
For an 8-to-3 binary encoder with inputs I0-I7 the logic expressions of the outputs Y0-
Y2 are:
Y0 = I1 + I3 + I5 + I7
Y1= I2 + I3 + I6 + I7
Y2 = I4 + I5 + I6 +I7
Page 6
DEPT OF ECE GIST
For example, along with being able to add and subtract binary numbers we
need to be able to compare them and determine whether the value of input A is greater
than, smaller than or equal to the value at input B etc. The digital comparator
accomplishes this using several logic gates that operate on the principles of Boolean
Algebra. Thereare two main types of Digital Comparator available and these are.
Page 7
DEPT OF ECE GIST
Digital comparators actually use Exclusive-NOR gates within their design for
comparing their respective pairs of bits. When we are comparing two binary or BCD
values or variables against each other, we are comparing the ―magnitude‖ of these
values, logic ―0‖ against logic ―1‖.
Inputs Outputs
B A A>BA=BA<B
0 0 0 1 0
0 1 1 0 0
1 0 0 0 1
1 1 0 1 0
Page 8
DEPT OF ECE GIST
bit adder in the previous tutorial. Multi-bit comparators can be constructed to compare
whole binary or BCD words to produce an output if one word is larger, equal to or
less than the other.
A very good example of this is the 4-bit Magnitude Comparator. Here, two
4-bit words (―nibbles‖) are compared to each other to produce the relevant output with
one word connected to inputs Aand the other to be compared against connected to
input B as shown below.
When comparing large binary or BCD numbers like the example above, to
save time the comparator starts by comparing the highest-order bit (MSB) first. If
equality exists, A = B then it compares the next lowest bit and so on until it reaches
the lowest-order bit, (LSB). If equality still exists then the two numbers are defined as
being equal. If inequality is found, either A > B or A < B the relationship between the
two numbers is determined and the comparison between any additional lower order
bits stops. Digital Comparator are used widely in Analogue-to-Digital converters,
(ADC) and Arithmetic Logic Units, (ALU) to perform a variety of arithmetic
operations
Page 9
DEPT OF ECE GIST
The Hamming distance between two strings of equal length is the number of
positions at which the corresponding symbols are different. In another way, it
measures the minimum number of substitutions required to change one string into the
other, or the minimum number of errors that could have transformed one string into
the other. A major application is in coding theory, more specifically to block codes, in
which the equal-length strings are vectors over a finite field.
For a fixed length n, the Hamming distance is a metric on the vector space of
the words of length n (also known as hamming), as it fulfills the conditions of non-
negativity, identity of in discernible and symmetry, and it can be shown by complete
induction that it satisfies the triangle inequality as well. The Hamming distance
between two words aandbcan also be seen as the Hamming weight of a−b for an
appropriate choice of the − operator.For binary strings a and b the Hamming distance
is equal to the number of one‘s (population count) in a XOR b. The metric space of
length-n binary strings, with the Hamming distance, is known as the Hamming cube;
it is equivalent as a metric space to the set of distances between vertices in a
hypercube graph. One can also view a binary string of length n as a vector in by
treating each symbol in the string as a real coordinate; with this embedding, the
strings form the vertices of an n-dimensional hypercube, and the Hamming distance of
the strings is equivalent to the Manhattan distance between the vertices.
Page 10
DEPT OF ECE GIST
CHAPTER 2
ERROR CORRECTION CODES
A new architecture that can reduce the latency and complexity of the data
comparison by using the characteristics of systematic codes. In addition, a new
processing element is presented to reduce the latency and complexity further.
Fig. 2.1 Timing diagram of the tag match in (a) direct compare method
and (b) proposed architecture
Page 11
DEPT OF ECE GIST
d = 8I + 4 (J + K + M) + 2 (L + N + O) + P. (2)
Note that sum-bit lines are dotted for visibility. Since what we need is not the
precise Hamming distance but the range it belongs to, it is possible to simplify the
circuit. When rmax = 1, for example, two or more than two 1‘s among the input bits
can be regarded as the same case that falls in the fourth range. In that case, we can
Page 12
replace several HAs with a simple OR-gate tree as shown in Fig. 6(b). This is an
advantage over the SA that resorts to the compulsory saturation expressed in (1).
Note that in Fig. 6, there is no overlap between any pair of two carry-bit lines
or any pair of two sum-bit lines. As the overlaps exist only between carry-bit lines and
sum-bit lines, it is not hard to resolve overlaps in the contemporary technology that
provides multiple routing layers no matter how many bits a BWA takes.
Fig. 2.4.Proposed BWA. (a) General structure and (b) new structure revised
for the matching of ECC-protected data.
We now explain the overall architecture in more detail. Each XOR stage in
generates the bitwise difference vector for either data bits or parity bits, and the
following processing elements count the number of 1‘s in the vector, i.e., the
Hamming distance. Each BWA at the first level is in the revised form shown in Fig.
6(b), and generates an output from the OR-gate tree and several weight bits from the
HA trees. In the interconnection, such outputs are fed into their associated processing
elements at the second level. The output of the OR-gate tree is connected to the
subsequent OR-gate tree at the second level, and the remaining weight bits are
connected to the second level BWAs according to their weights. More precisely, the
bits of weight w are connected to the BWA responsible for w-weight inputs. Each
BWA at the second level is associated with a weight of a power of two that is less
than or equal to Pmax, where Pmax is the largest power of two that is not greater than
rmax + 1. As the weight bits associated with the fourth range are all ORed in the
revised BWAs, there is no need to deal with the powers of two that are larger than
Page 13
Pmax. For example, let us consider a simple (8, 4) single-error correction double-error
detection code. The corresponding first and second level circuits are shown in Fig. 7.
Note that the encoder and XOR banks are not drawn in Fig. 7 for the sake of
simplicity. Since rmax = 2, Pmax = 2 and there are only two BWAs dealing with
weights 2 and 1 at the second level. As the bits of weight 4 fall in the fourth range,
they are ORed. The remaining bits associated with weight 2 or 1 are connected to their
corresponding BWAs. Note that the interconnection induces no hardware complexity,
since it can be achieved by a bunch of hard wiring.Taking the outputs of the preceding
circuits, the decision unit finally determines if the incoming tag matches the retrieved
codeword by considering the four ranges of the Hamming distance. The decision unit
is in fact a combinational logic of which functionality is specified by a truth table that
takes the outputs of the preceding circuits as inputs. For the (8, 4) code that the
corresponding first and second level circuits are shown in Fig. 7, the truth table for the
decision unit is described in Table I. Since U and V cannot be set simultaneously,
such cases are implicitly included in do not care terms in Table I.
Fig. 2.5. First and second level circuits for a (8, 4) code.
The complexity as well as the latency of combinational circuits heavily depends on
the algorithm employed. In addition, as the complexity and the latency are usually
conflicting with each other, it is unfortunately hard to derive an analytical and fully
deterministic equation that shows the relationship between the number of gates and
the latency for the proposed architecture and also for the conventional SA-based
architecture. To circumvent the difficulty in analytical derivation, we present
insteadan expression that can be used to estimate the complexity and the latency by
employing some variables for the nondeterministic parts. The complexity of the
proposed architecture, C, can be expressed as
Page 14
where CXOR, CENC, C2nd, CDU, and CBWA(n) are the complexities of
XOR banks, an encoder, the second level circuits, the decision unit, and a BWA for n
inputs, respectively. Using the recurrence relation, CBWA(n) can be calculated as
where LXOR, LENC, L2nd, LDU, and LBWA(n) are the latencies of an XOR
bank, an encoder, the second level circuits, the decision unit, and a BWA for n inputs,
respectively. Note that the latencies of the OR-gate tree and BWAs for x ≤ n inputs at
the second level are all bounded by log2 n . As one of BWAs at the first level finishes
earlier than the other, some components at the second level may start earlier.
Similarly, some BWAs or the OR-gate tree at the second level may provide their
output earlier to the decision unit so that the unit can begin its operation without
waiting for all of its inputs. In such cases, L2nd and LDU can be partially hidden by
the critical path of the preceding circuits, and L becomes shorter than the given
expression.
2.2 EXCLUSIVE OR
We know Gate, the OR Gate and the NOT Gate, we can build many other
types of logic gate functions, such as a NAND Gate and a NOR Gate or any other
type of digital logic function. But there are two other types of digital logic gates
which although they are not a basic gate in their own right as they are constructed by
combining together other logic gates, their output Boolean function is important
enough to be considered as complete logic gates.
Page 15
These two ―hybrid‖ logic gates are called the Exclusive-OR (Ex-OR) Gate and its
complement the Exclusive-NOR(Ex-NOR) Gate. Previously, we saw that for a 2-
input OR gate, if A = ―1‖, ORB =―1‖, OR BOTHA + B = ―1‖ then the output from the
digital gate must also be at a logic level ―1‖ and because of this, this type of logic gate
is known as an Inclusive-OR function. The gate gets its name from the fact that it
includes the case of Q = ―1‖ when both A and B = ―1‖.If however, an logic output ―1‖
is obtained when ONLYA = ―1‖ or when ONLYB = ―1‖ but NOT both together at
the same time, giving the binary inputs of ―01‖ or ―10‖, then the output will be ―1‖.
This type of gate is known as an Exclusive-OR function or more commonly an Ex-Or
function for short. This is because its boolean expression excludes the ―OR BOTH‖
case of Q = ―1‖ when both A and B = ―1‖.In other words the output of an Exclusive-
OR gate ONLY goes ―HIGH‖ when its two input terminals are at ―DIFFERENT‖
logic levels with respect to each other.An odd number of logic ―1‘s‖ on its inputs
gives a logic ―1‖ at the output. These two inputs can be at logic level ―1‖ or at logic
level ―0‖ giving us the Boolean expression of: Q = (A B) = A.B + A.B The
Exclusive- OR Gate function, or Ex-OR for short, is achieved by combining standard
logic gates together to form more complex gate functions that are used extensively
in building arithmetic logic circuits, computational logic comparators and error
detection circuit s.
B A Q
0 0 0
0 1 1
1 1 0
Page 16
Page 16
The truth table above shows that the output of an Exclusive-OR gate ONLY
goes ―HIGH‖ when both of its two input terminals are at ―DIFFERENT‖ logic levels
with respect to each other. If these two inputs, A and B are both at logic level ―1‖ or
both at logic level ―0‖ the output is a ―0‖ making the gate an ―odd but not the even
gate‖.
This ability of the Exclusive-OR gate to compare two logic levels and produce
an output value dependent upon the input condition is very useful in computational
logic circuits as it gives us the following Boolean expression of:
Q = (A B) = A.B + A.B
Then an Ex-OR function with more than two inputs is called an ―odd
function‖ or modulo-2-sum (Mod-2-SUM), not an Ex-OR. This description can be
expanded to apply to any number of individual inputs as shown below for a 3-input
Ex-OR gate.
Binary Addition follows these same basic rules as for the denary
additionabove except in binary there are only two digits with the largest digit being
―1‖. So when adding binary numbers, a carry out is generated when the ―SUM‖
equals or is greater than two (1+1) and this becomes a ―CARRY‖ bit for any
subsequent addition
Page 17
being passed over to the next column for addition and so on. Consider the single bit
addition below.
0 0 1 1
+0 +1 +0 +1
0 1 1 (carry) 1←0
When the two single bits, A and B are added together, the addition of ―0 + 0‖,
―0 + 1‖ and ―1 + 0‖ results in either a ―0‖ or a ―1‖ until you get to the final column of
―1 + 1‖ then the sum is equal to ―2‖. But the number two does not exists in binary
however, 2 in binary is equal to 10, in other words a zero for the sum plus an extra
carry bit.
Then the operation of a simple adder requires two data inputs producing two
outputs, the Sum (S) of the equation and a Carry (C) bit as shown.
For the simple 1-bit addition problem above, the resulting carry bit could be
ignored but you may have noticed something else with regards to the addition of these
two bits, the sum of their binary addition resembles that of an Exclusive-OR Gate. If
we label the two bits as A and B then the resulting truth table is the sum of the two
bits but without the final carry.
For the simple 1-bit addition problem above, the resulting carry bit could be
ignored but you may have noticed something else with regards to the addition of these
two bits, the sum of their binary addition resembles that of an Exclusive-OR Gate.If
we label the two bits as A and B then the resulting truth table is the sum of the two
bits but without the final carry.
Page 18
2-input Exclusive-OR Gate
B A S
0 0 0
0 1 1
1 1 0
We can see from the truth table above, that an Exclusive-OR gate only
produces an output ―1‖ when either input is at logic ―1‖, but not both the same as for the
binary addition of the previous two bits. However in order to perform the addition of
two numbers, microprocessors and electronic calculators require the extra carry bit to
correctly calculate the equations so we need to rewrite the previous summation to
include two-bits of output data as shown below.
00 00 01 01
+ 00 + 01 + 00 + 01
00 01 01 10
B A C
0 0 0
0 1 0
1 0 0
1 1 1
Page 19
By combining the Exclusive-OR gate with the AND gate results in a simple
digital binary adder circuit known commonly as the ―Half Adder‖ circuit.
B A SUM CARRY
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
From the truth table of the half adder we can see that the SUM (S) output is
the result of the Exclusive-OR gate and the Carry-out (Cout) is the result of the AND
gate. Then the Boolean expression for a half adder is as follows.
One major disadvantage of the Half Adder circuit when used as a binary adder
is that there is no provision for a ―Carry-in‖ from the previous circuit when adding
together multiple data bits. For example, suppose we want to add together two 8-bit
bytes of data, any resulting carry bit would need to be able to ―ripple‖ or move across
the bit patterns starting from the least significant bit (LSB). The most complicated
operation the half adder can do is ―1 + 1‖ but as the half adder has no carry input the
resultant added value would be incorrect. One simple way to overcome this problem is
to use a FullAddertype binary adder circuit.
Page 20
CHAPTER 3
3.1 INTRODUCTION
This means that the power consumption will be minimized without violating
this performance constraint. Power-aware systems are typically systems that have
limited power budgets but provide a respectable performance as well. When these
circuits are designed, low-power consumption is of importance as well, but the point
of view is different. In power-aware design the performance is maximized subject to a
power budget
The following methodologies are the most powerful ones and applicable to
virtually every system. They include static voltage scaling, frequency scaling, and
various other kinds of voltage scaling (sometimes combined with frequency scaling),
clock gating and power gating. Finally, a section is dedicated to technology scaling.
However, in the next paragraph will be discussed how easily this process can
be slowed down. Operating at the normal supply voltage theoretically assures correct
operation for a certain period of time, but if we require high reliability and need to
depend on the technology for long periods of time (such as in biomedical implants), it
is not only wise to reduce VDD for power-saving purposes but also for increasing the
reliability. A small reduction in the supply voltage can already substantially diminish
the device degradation over time, since device aging is nearly exponentially
dependent on VDD.
Apart from VDD, there is another variable in the equation of section 2.3 that
intuitively suggests possibilities for reducing power: frequency. Of course, the clock
frequency inbounds by the desired throughput of the system. However, a system does
rarely operate at its maximum throughput all the time. Often, the desired throughput is
much lower than its maximum performance. Then, it is possible to lower the
operating frequency in order to save power (and thus also energy), which is called
frequency scaling. Sometimes voltage scaling and frequency scaling are employed
simultaneously, such as in cell phones, when they are in stand-by mod.
Ideally, we want access to an infinite amount of supply voltages, such that the
optimal voltage can be chosen. In reality, this is impossible, and we have to work with
a large, but limited number of voltages. Apart from lowering the supply voltage, it is
also possible to lower the clock frequency. A reduction in VDD wills always
increases the delay to some extent, but if the cycle time is still much higher than the
delay of the circuit, energy is wasted. Therefore dynamic voltage and frequency
scaling (DVFS) can be employed to save energy to the maximum.
The ratio X/Y is determined by the integer N. Since the duty controller runs at
64 MHz (which is 64 times faster than the timing controller), and N can represent 64
numbers, we are able to create64 different values of VDDL. For example, if N=32,
then VDD is turned on for 0.5 μs and for the remaining 0.5 μs the value is zero. Then
the average value of VDDL is 0.5 · VDD. The low-pass filter3 is placed off-chip.
Finally, there is the feedback loop back to the speed detector. If VDD is 1 Volt, which
is common in 90nm CMOS, this variable supply voltage scheme can provide a
resolution of 15.6 mV (meaning that VDDL can be varied in steps as small as 15.6
mV). Obviously, the larger the frequency of the buck converter (and the range of the
counter), the larger the resolution, and the closer VDDL will be to the optimal value.
It appears that the external frequency fext assumes some predefined values (based on
different performance requirements), such that VDDL can be fine-tuned for the
specific frequency.
The reason why this scheme is presented in such close details mainly to
provide a deeper insight in how dynamic voltage scaling exactly works, but also to
show that a significant amount of overhead is required for this technique. Multi-level
voltage scaling is a form of dynamic voltage scaling and essentially an extension of
static voltage scaling. Based on the required performance, the supply voltage can be
scaled between a small number of fixed and discrete voltage levels. The advantage of
Multi-level voltage scaling is that it is a significantly less expensive power scheme
than DVS with a virtually infinite number of supply voltages.
The Dual Variable Supply-voltage scheme (Dual-VS) is a combination
between DVS and clustered voltage scaling (CVS). First, the circuit is clustered into a
high-VDD and a low-VDD cluster. Both supply voltages are variable, and —in
contrast with multi-level voltage scaling— non-discrete. The minimal voltage is
controlled by loops for both the high and low supply voltages.
Assume that we, for example, require a rate (normalized frequency) of 0.5 for
a certain period of time. If we do not implement a technique which allows us to
dynamically adjust parameters as voltage and frequency, and the circuit always
operates at the maximum rate of 1, a dramatic amount of the total dissipated energy is
wasted energy. When utilizing DVFS, fclk is lowered (in this case by 50%) and since
a lower frequency is required also the supply voltage can be decreased However, also
DVFS has an important drawback: access to a vast (ideally infinite) amount of
different supply voltages requires a significant amount of hardware overhead.
In the previous sections we have referred to (sub) systems that do not always
operate at their maximum performance. It is also possible that parts of a system are
idle for period of time: then no useful computational work is performed. Still, there
is power consumed. A subsystem being idle does not necessarily mean that the
subsystem is nonperforming any computations. It only means that the results are not
being utilized. This is possible when the subsystem is still fed with data, but the result
is discarded, because it is not needed at that moment If there are large registers
presenting these subsystems, this power dissipation can become quite significant.
And, finally, there is the power dissipation of the clock network in the subsystem.
Clock networks are very expensive in terms of power. A major portion of the
total power consumption of the system is dissipated in the clock network (mainly in
the clock buffers/drivers). Considering the above, there is a lot of power that can be
saved when a subsystems idle. One way to achieve this goal is to apply clock gating.
This essentially means that the clock signal of the subsystem is cut off. This will save
the power dissipated in flip-flops and the clock network. If the combinational logic in
the subsystem is fed by registers at the inputs, the logic will stop switching. It will,
however, not save the leakage power.
Transistors with low-VTH are suitable for high performance, but not for low-
leakage, and vice versa. Therefore, the transistors in the circuit have low threshold
voltages and the switches have high voltage thresholds. This is called a dual-threshold
voltage technology or MTCMOS (multi-threshold CMOS). These high-VTH
transistors do, however, cause a problem. Since VDD is low and VTH is relatively
high, these transistors will be slow. In order to speed them up, we would need to
resize. When a block is asleep it costs some time to wake it up again, the same as it
costs some time to put a block to sleep.
This introduces additional delays. Also during wake-up and going to sleep,
still some leakage power is dissipated which makes power gating not perfect. The
essential criteria for implementing power gating is the total leakage power component
and how many and how often blocks are idle. The leakage power highly depends on
the technology being utilized and the impact of the leakage power highly depends on
the system frequency being utilized. If the leakage components significant and many
blocks are idle for longer periods of time, power gating maybe efficient. One should
however be aware of the fact that power gating is much more difficult to implement
than clock gating and leads to significantly higher costs (mainly because of all the
switches that are required). It is also important to realize that power gating is much
more invasive than clock gating. While clock gating does not affect the functionality
of the system, power gating does. It affects inter-block communication and, as
mentioned before, adds time delays to safely enter and exit power gated modes.
Page 28
3.4.8 Technology Scaling
Another way to save energy is to improve and scale the technology. Over the
last decades CMOS technology has improved and scaled from 10μm in the early
1970‘s to32nm in 2010.Sizes as small as 11 nm are expected around the year 2015.
Ideally, the voltages, electric fields, and linear dimensions remain constant with the
scaling factor' as explained in section 2.4.1. Therefore, the energy savings scale with
'3 (VDD and the capacitance of the transistors scales with ', where power/energy has a
quadratic dependence on VDD). In reality it is difficult to scale VTH along with
VDD.
3.5.1 Parallelization
3.5.2 Pipeling
This results in the following: Ppipe = Cpipe· V 2pipe · fpipe (9)Ppipe = (_Cref
)(Vref )2fref (10)Ppipe =O(2Pref ) (11)So, also here, the power savings are upper
bounded by the supply voltage reduction. The total capacity being switched has been
increased slightly because of the extra pipeline registers, so be slightly lengthen one.
Page 29
The supply voltage can be reduced to the same extent as in parallelization.
This technique is, however, much more interesting for designs with limited area
budgets. However, if there is a feedback looping the circuit, pipelining cannot be
employed. Note that pipelining and parallelization can also be employed
simultaneously to obtain even larger power savings. The same upper bound holds true
for this methods mentioned, but can be even smaller, since the critical path has now
been reduced to 4T instead of 2T. A voltage reduction of approximately 60% ( =
0.4,the point where the delay has quadrupled) is now possible. Note that the relation
between VDD and delay may vary between different types of technology, so the
numbers of we presented serve only as an indication.
Page 30
DEPT OF ECE GIST
The paths do have a certain delay, as they will have in a real life situation. Path 1
has however a slightly longer delay than path 2, causing a spurious transition in
output Z.These undesired transitions can be eliminated by path balancing (a.k.a. path
equalization), a technique that makes sure that the delay of all paths that converge at
each gate is about equal [11, 21]. This can be done by inserting unit-delay buffers in
the paths that are shorter than the others.
Since an n-input gate (e.g. a 4-bit AND-gate) can have significant differences
in input capacitance between its various inputs, it is wise to connect the net with the
highest switching activity to the input pin with the lowest input capacitance and vice
versa, to reduce power (the higher the capacitive load, the higher the power
dissipation). For example, the input capacitance of a 4-bit AND-gate in UMC 90nm
Page 31
DEPT OF ECE GIST
Technology can be observed. High input capacitances result in slow logic and high
power consumption It is better to decompose a gate with a high fan-in into a network
of multiple gates with a low fan-in, which significantly reduces the total capacitance.
For example, in the UMC 90nm library, gates have a maximum fan-in of four inputs.
Typically, the synthesis tool takes care of these optimizations. For example, a 16-bit
AND-gate implemented in VHDL, will be decomposed by the synthesis tool and
implemented by e.g. a tree of 4-bitAND-gates.
At the lowest abstraction level, the technology (or transistor) level, a number
of optimizations can be performed in order to reduce power consumption. Whenno
low-power methodologies have been applied to the circuit, optimizations at the
technology level will not be necessary. But if the supply voltage is altered and the
delay of the circuit is compromised by low-power design methodologies,
optimizations at technology level may be desired. Optimizations at technology level
include adjusting the threshold voltage of the transistors, and/or altering their sizes.
Page 32
DEPT OF ECE GIST
Fig 3.7 Energy consumption versus scaling factor N for various values of P
In conclusion, simultaneously optimizing VDD, VTH, and transistor size will lead to
the optimal result. By not only reducing VDD, but also optimizing VTH and
transistorize it is possible to achieve significant energy savings without compromising
the speed of the circuit.
Standard CMOS is the most common and widely utilized digital logic in
almost any application field. Still, other digital logic styles exist and are utilized in the
industry as well. Since we aim in this thesis work for special design characteristics,
such as very low power consumption and very small sized designs, it is fair to have a
look at other digital logic design styles as well. First the fundamental differences
between static and dynamic logic will be explained. Then, the three most interesting
alternative digital logic styles are discussed.
In static logic circuits the clock signal is only utilized for memory cells (flip-
flops). Pure combinational circuits do not need a clock signal at all. In dynamic logic,
all cells are clocked (this is the reason why dynamic logic is also referred to as
clocked logic), even if the circuit is purely combinational. This may seem odd,
Page 33
DEPT OF ECE GIST
especially given the fact that theclock network is one of the largest energy consumers
in almost any design, but dynamiclogic provides a number of advantages. Dynamic
logic is actually commonly utilized incomputer memories nowadays. All types of
DRAM is dynamic logic (Dynamic RandomAccess Memory). Another well-known
application of dynamic logic is domino logic
Page 34
DEPT OF ECE GIST
CHAPTER 4
VERILOG
4.1 OVERVIEW
Page 35
Sequential statements are placed inside a begin/end block and executed in
sequential order within the block. But the blocks themselves are executed
concurrently, qualifying Verilog as a dataflow language.
4.2. HISTORY
4.2.1. Beginning
Cadence now has full proprietary rights to Gateway's Verilog and the Verilog-
XL, the HDL-simulator that would become the de-facto standard (of Verilog
logicsimulators) for the next decade. Originally, Verilog was intended to describe and
allow simulation; only afterwards was support for synthesis added.
4.2.2. Verilog-95
With the increasing success of VHDL at the time, Cadence decided to make
the language available for open standardization. Cadence transferred Verilog into the
public domain under the Open Verilog International (OVI) (now known as Accellera)
organization. Verilog was later submitted to IEEE and became IEEE Standard 1364-
1995, commonly referred to as Verilog-95.
In the same time frame Cadence initiated the creation of Verilog-A to put
standards support behind its analog simulator Specter. Verilog-A was never intended
Page 36
to be a standalone language and is a subset of Verilog-AMS which encompassed
Verilog-95.
Page 37
Foundations of Superlog and Vera were donated to Accelerate, which later became
the IEEE standard P1800-2005: SystemVerilog.
This productivity crisis (along with a similar one on the design side) led to the
creation of Accellera, a consortium of EDA companies and users who wanted to
create the next generation of Verilog. The donation of the Open-Vera language
formed the basis for the HVL features of SystemVerilog.Accellera‘s goal was met in
November 2005 with the adoption of the IEEE standard P1800-2005 for
SystemVerilog, IEEE (2005).
Page 38
4.3 EXAMPLES
Module main;
initial
begin
$display("Hello world!");
$finish;
end
endmodule
inputclock;
input reset;
regflop1;
regflop2;
always@ (posedgeresetorposedgeclock)
if(reset)
begin
flop1 <= 0;
flop2 <= 1;
end
else
begin
flop1 <= flop2;
end
end module
Page 39
The "<=" operator in Verilog is another aspect of its being a hardware
description language as opposed to a normal procedural language. This is known as a
"non-blocking" assignment. Its action doesn't register until the next clock cycle.
This means that the order of the assignments are irrelevant and will produce
the same result: flop1 and flop2 will swap values every clock.The other assignment
operator, "=", is referred to as a blocking assignment. When "=" assignment is used,
for the purposes of logic, the target variable is updated immediately.
In the above example, had the statements used the "=" blocking operator
instead of "<=", flop1 and flop2 would not have been swapped. Instead, as in
traditional programming, the compiler would understand to simply set flop1 equal to
flop2 (and subsequently ignore the redundant logic to set flop2 equal to flop1.)
parametersize = 5;
parameterlength = 20;
inputcet;
inputcep;
output[size-1:0] count;
outputtc;
Page 40
reg[size-1:0] count;//Signals assigned
// within an always
// (or initial)block
always@ (posedgeclkorposedgerst)
cntrcount<= {size{1'b0}};
else
begin
if(count == length-1)count
<= {size{1'b0}};
else
end
endmodule
Ex4: An example of delays:
rega, b, c, d;
wiree;
always@(b or e)
begin
a = b & e;
b = a | b;
#5 c = b;
d = #6 c ^ e;
end
The always clause above illustrates the other type of method of use, i.e.
the always clause executes any time any of the entities in the list change, i.e. the b or e
change. When one of these changes, immediately a is assigned a new value, and due
to the blocking assignment b is assigned a new value afterward (taking into account
the new value of a.) After a delay of 5 time units, c is assigned the value of b and the
value of c ^ e is tucked away in an invisible store.
Then after 6 more time units, d is assigned the value that was tucked
away. Signals that are driven from within a process (an initial or always block) must
be of type reg. Signals that are driven from outside a process must be of type wire.
The keyword reg does not necessarily imply a hardware register.
4.4 CONSTANTS
Examples:
There are several statements in Verilog that have no analog in real hardware,
e.g. $display. Consequently, much of the language can not be used to describe
hardware. The examples presented here are the classic subset of the language that has
a direct mapping to real gates.
wireout;
assignout = sel ? a : b;
regout;
always@(a or borsel)
begin
case(sel)
1'b0: out = b;
1'b1: out = a;
Endcase
end
// procedural structure.
regout;
always@(a or borsel)
if(sel)
out = a;
else
out = b;
The next interesting structure is a transparent latch; it will pass the input to
the output when the gate signal is set for "pass-through", and captures the input and
stores it upon transition of the gate signal to "hold".
The output will remain stable regardless of the input signal while the gate
is set to "hold". In the example below the "pass-through" level of the gate would be
when the value of the if clause is true, i.e. gate = 1.
This is read "if gate is true, the din is fed to latch_out continuously." Once
the if clause is false, the last value at latch_out will remain and is independent of the
value of din.
regout;
always@(gate or din)
if(gate)
The flip-flop is the next significant template; in Verilog, the D-flop is the
simplest, and it can be modeled as:
regq;
always@(posedgeclk)
q <= d;
The significant thing to notice in the example is the use of the non-blocking
assignment. A basic rule of thumb is to use <= when there is a
posedgeornegedgestatement within the always clause.
regq;
always@(posedgeclkorposedgereset)
if(reset)
q <= 0;
else
q <= d;
The next variant is including both an asynchronous reset and asynchronous set
condition; again the convention comes into play, i.e. the reset term is followed by the
set term.
regq;
always@(posedgeclkorposedgeresetorposedgeset)
if(reset)
q <= 0;
else
if(set)
q <= 1;
else
q <= d;
Note: If this model is used to model a Set/Reset flip flop then simulation
errors can result. Consider the following test sequence of events. 1) reset goes high 2)
clkgoes high 3) set goes high 4) clk goes high again 5) reset goes low followed by 6)
set going low. Assume no setup and hold violations.
In this example the always @ statement would first execute when the rising
edge of reset occurs which would place q to a value of 0. The next time the always
block executes would be the rising edge of clk which again would keep q at a value of
0. The always block then executes when set goes high which because reset is high
forces q to remain at 0.
This condition may or may not be correct depending on the actual flip flop.
However, this is not the main problem with this model. Notice that when reset goes
low, that set is still high. In a real flip flop this will cause the output to go to a 1.
However, in this model it will not occur because the always block is triggered
by rising edges of set and reset - not levels. A different approach may be necessary for
set/reset flip flops.
Note that there are no "initial" blocks mentioned in this description. There is a
split between FPGA and ASIC synthesis tools on this structure. FPGA tools allow
initial blocks where reg values are established instead of using a "reset" signal.
ASIC synthesis tools don't support such a statement. The reason is that an
FPGA's initial state is something that is downloaded into the memory tables of the
FPGA. An ASIC is an actual hardware implementation.
There are two separate ways of declaring a Verilog process. These are
thealways and the initial keywords. The always keyword indicates a free-running
process. The initial keyword indicates a process executes exactly once. Both
constructs begin execution at simulator time 0, and both execute until the end of the
block. Once an always block has reached its end, it is rescheduled (again). It is a
common misconception to believe that an initial block will execute before an always
block. In fact, it is better to think of the initial-block as a special-case of the always-
block, one which terminates after it completes for the first time.
//Examples:
initial
begin
end
begin
if(a)
c = b;
else
d = ~b;
end//Done with this block, now return to the top (i.e. the @ event-control)
a <= b;
These are the classic uses for these two keywords, but there are two significant
additional uses. The most common of these is an alwayskeyword without the @(...)
sensitivity list. It is possible to use always as shown below:
always
The always keyword acts similar to the "C" construct while(1) {..} in the sense
that it will execute forever.
The other interesting exception is the use of the initial keyword with the
addition of the forever keyword.
4.7 OPERATORS
^ Bitwise XOR
~^ or ^~ Bitwise XNOR
! NOT
&& AND
Logical
|| OR
Reduct | Reduction OR
ion
~| Reduction NOR
^ Reduction XOR
~^ or ^~ Reduction XNOR
+ Addition
- Subtraction
/ Division
** Exponentiation (*Verilog-2001)
Conditi ?: Conditional
Page 49
DEPT OF ECE GIST
Verilog and VHDL are Hardware Description languages that are used to write
programs for electronic chips. These languages are used in electronic devices that do
not share a computer‘s basic architecture. VHDL is the older of the two, and is based
on Ada and Pascal, thus inheriting characteristics from both languages. Verilog is
relatively recent, and follows the coding methods of the C programming language.
VHDL is a strongly typed language, and scripts that are not strongly typed, are
unable to compile. A strongly typed language like VHDL does not allow the
intermixing, or operation of variables, with different classes. Verilog uses weak
typing, which is the opposite of a strongly typed language. Another difference is the
case sensitivity. Verilog is case sensitive, and would not recognize a variable if the
case used is not consistent with what it was previously. On the other hand, VHDL is
not case sensitive, and users can freely change the case, as long as the characters in
the name, and the order, stay the same.
In general, Verilog is easier to learn than VHDL. This is due, in part, to the
popularity of the C programming language, making most programmers familiar with
the conventions that are used in Verilog. VHDL is a little bit more difficult to learn
and program.
VHDL has the advantage of having a lot more constructs that aid in high-level
modeling, and it reflects the actual operation of the device being programmed.
Complex data types and packages are very desirable when programming big and
complex systems, that might have a lot of functional parts. Verilog has no concept of
packages, and all programming must be done with the simple data types that are
provided by the programmer.
Page 50
DEPT OF ECE GIST
5. Verilog has very simple data types, while VHDL allows users to create more
complex data types.
• The complexity of ASIC and FPGA designs hasmeant an increase in the number of
specific toolsand libraries of macro and mega cells written ineither VHDL or Verilog.
• The Verilog hardware description language hasbeen used far longer than VHDL and
has been
Page 51
DEPT OF ECE GIST
Page 52
DEPT OF ECE GIST
Basic constructs
Verilog – hierarchy
• A module is the basic unit of the model, and it maybe composed of instances of
other modules
Page 53
DEPT OF ECE GIST
Verilog – hierarchy
Verilog
–Output
– Inout
Page 54
DEPT OF ECE GIST
– Task/Function definitions
Verilog - parameters
Verilog - nets
• Nets are the things that connect modelcomponents together – like signals in VHDL
• Example: wire
w1, w2; tri
[31:0] bus32;
Page 55
DEPT OF ECE GIST
• Each net type has functionality that is used tomodel different types of hardware such
as CMOS,NMOS, TTL etc
• If there is more than one driver, the value of thenet is determined by a built-in
resolution function
Verilog - registers
• Registers can be used as the source for a primitiveor module instance (i.e. registers
can be connectedto input ports), but they cannot be driven in thesame way a net can
• Examples:
– reg r1, r2;
– Reg:
• This is the generic register data type. A reg declaration canspecify registers which
are 1 bit wide to 1 million bits wide
– Integer
Page 56
DEPT OF ECE GIST
Verilog - memories
Verilog - primitives
• Examples:
module test;
regain, bin;
Page 57
DEPT OF ECE GIST
Procedural blocks
• Procedural blocks are the part of the languagewhich represents sequential behavior
• A module can have as many procedural blocks asnecessary
• These blocks are sequences of executablestatements
• The statements in each block are executedsequentially, but the blocks themselves
are concurrent and asynchronous to other blocks
• There are two types of procedural blocks, initialblocks and always blocks
• All initial and always blocks contain a singlestatement, which may be a compound
statement,
e.g.:
initial
begin statement1 ; statement2 ; ... end
Page 58
DEPT OF ECE GIST
• Because the statement may be a compoundstatement, this may entail executing lots
of
statements
• An initial block may cause activity to occurthroughout the entire simulation of the
model
x = 1; // an initialization
y = f(x);
• The only difference between an always block andan initial block is that when the
always statementfinishes execution, it starts executing again
Verilog – Tasks/functions
Verilog – Tasks
Page 59
DEPT OF ECE GIST
• Tasks may have zero or more arguments, and theymay be input, output, or inout
arguments
• Time can elapse during the execution of a task,according to time and event controls
in the task
definition
• Exmaple:
taskdo_read;
begin
Verilog – Functions
• In contrast to tasks, functions must execute in asingle instant of simulated time
• That is, not time or delay controls are allowed in afunction
• Function arguments are also restricted to inputsonly.
input [3:0]
relocation_factor; begin
relocate = addr + (relocation_factor<<12);
assignabsolute_address = relocate(relative_address,
rf);
VHDL/Verilog comparison
Capability
Page 60
DEPT OF ECE GIST
• The choice of which to use is not therefore basedsolely on technical capability but
on:
– personal preferences
– EDA tool availability
– commercial, business and marketing issues
Compilation
• VHDL:
Page 61
DEPT OF ECE GIST
– The care must be taken with both the compilationorder of code written in a single
file and the
compilation order of multiple files.
– This may mean dedicated conversion functions areneeded to convert objects from
one type to another.
– The choice of which data types to use should beconsidered wisely, especially
enumerated (abstract)data types.
– VHDL may be preferred because it allows a multitudeof language or user defined
data types to be used.
• Verilog:
– Compared to VHDL, Verilog data types a very simple,easy to use and very much
geared towards modelinghardware structure as opposed to abstract
hardwaremodeling.
– Unlike VHDL, all data types used in a Verilog model aredefined by the Verilog
language and not by the user.
– There are net data types, for example wire, and aregister data type called reg.
– A model with a signal whose type is one of the net datatypes has a corresponding
electrical wire in the impliedmodeled circuit.
– Verilog may be preferred because of it's simplicity.
Design reusability
• VHDL:
– Procedures and functions may be placed in a packageso that they are available to
any design-unit thatwishes to use them
• Verilog:
– There is no concept of packages in Verilog.
– Functions and procedures used within a model must bedefined in the module.
Page 62
DEPT OF ECE GIST
– First, it is very strongly typed; a feature that makes itrobust and powerful for the
advanced user after alonger learning phase.
– Second, there are many ways to model the samecircuit, specially those with large
hierarchical structuresHigh level constructs
• VHDL:
– There are more constructs and features for high-levelmodeling in VHDL than there
are in Verilog.
• generic statements for generic models that can beindividually characterized, for
example, bit width.
– All these language statements are useful insynthesizable models.
• Verilog:
Page 63
DEPT OF ECE GIST
– Simple two input logical operators are built into thelanguage, they are: NOT, AND,
OR, NAND, NOR, XORand XNOR.
– Any timing must be separately specified using the afterclause.
– Separate constructs defined under the VITAL languagemust be used to define the
cell primitives of ASIC andFPGA libraries.
• Verilog:
– The Verilog language was originally developed withgate level modeling in mind,
and so has very goodconstructs for modeling at this level and for modelingthe cell
primitives of ASIC and FPGA libraries.
– Examples include User Defined Primitives (UDP), truthtables and the specify block
for specifying timing delaysacross a module.
Managing large designs
• VHDL:
• Verilog:
– There are no statements in Verilog that help managelarge designs
Operators
• The majority of operators are the same betweenthe two languages.
• Verilog does have very useful unary reductionoperators that are not in VHDL.
• VHDL has the mod operator that is not found inVerilog. Procedures and tasks
• VHDL:
– concurrent procedure calls are allowed
• Verilog:
– concurrent procedure calls are not allowed
Readability
Page 64
DEPT OF ECE GIST
• Verilog is more like C because it's constructs arebased approximately 50% on C and
50% on Ada.
• For this reason an existing C programmer mayprefer Verilog over VHDL.
Structural replication
• VHDL:
• Verilog:
• VHDL:
– It does mean models are often more verbose, and thecode often longer, than it's
Verilog equivalent.
• Verilog:
– Signals representing objects of different bits widthsmay be assigned to each other.
– The signal representing the smaller number of bits isautomatically padded out to
that of the larger numberof bits, and is independent of whether it is the assignedsignal
or not.
– Unused bits will be automatically optimized awayduring the synthesis process.
– This has the advantage of not needing to model quiteso explicitly as in VHDL, but
does mean unintendedmodeling errors will not be identified by an analyzer.
Examples
Binary up counter
• VHDL:
Page 65
DEPT OF ECE GIST
process (clock)
begin
• Verilog:
counter<= counter + 1; D
FilpFlop
• VHDL: process
(<clock>) begin
<output><=
<input>; end if;
end process;
• Verilog:
Synchronous multiplier
• VHDL: process
(<clock>) begin
Page 66
DEPT OF ECE GIST
end process;
• Verilog:
wire [17:0] <a_input>;
always @(posedge<clock>)
<product><= <a_input> *
<b_input>;
Page 67
DEPT OF ECE GIST
CHAPTER 5
XILINX
Note: After you convert your project, you cannot open it in previous versions of the
ISE software, such as the ISE 11 software. However, you can optionally create a
backup of the original project as part of project migration, as described below.
Note You may need to change the extension in the Files of type field
todisplay .npl (ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE 10
software) project files.
iii. In the dialog box that appears, select Backup and Migrate or Migrate Only.
iv. The ISE software automatically converts your project to an ISE 12 project.
Note If you chose to Backup and Migrate, a backup of the original project is
created at project_name_ise12migration.zip.
5.2 PROPERTIES
For information on properties that have changed in the ISE 12 software, see
ISE 11 to ISE 12 Properties Conversion.
5.2.1 IP Modules
If your design includes IP modules that were created using CORE Generator™
software or Xilinx® Platform Studio (XPS) and you need to modify these modules,
you may be required to update the core. However, if the core net list is present and
Page 68
DEPT OF ECE GIST
you do not need to modify the core, updates are not required and the existing net list
is used during implementation.
To help familiarize you with the ISE software and with FPGA and CPLD
designs, a set of example designs is provided with Project Navigator. The examples
show different design techniques and source types, such as VHDL, Verilog,
schematic, or EDIF, and include different constraints and IP.
To Open an Example
Note To help you choose an example project, the Project Description field describes
each project. In addition, you can scroll to the right to see additional fields, which
provide details about the project.
Note If you modified an example project and want to overwrite it with the
original example project, select File > Open Example, select the Sample Project
Name, and specify the same Destination Directory you originally used. In the dialog
box that appears, select Overwrite the existing project and click OK.
Page 69
DEPT OF ECE GIST
Project Navigator allows you to manage your FPGA and CPLD designs using
an ISE® project, which contains all the source files and settings specific to your
design. First, you must create a project and then, add source files, and set process
properties. After you create a project, you can run processes to implement, constrain,
and analyze your design. Project Navigator provides a wizard to help you create a
project as follows.
Note If you prefer, you can create a project using the box instead of the New
Project Wizard. To use the New Project dialog box, deselect the
Use New Project wizard option in the ISE General page of the Preferences dialog
box.
To Create A Project
1. Select File > New Project to launch the New Project Wizard.
2. In the page, set the name, location, and project type, and click Next.
3. For EDIF or NGC/NGO projects only: In the Project page select the input and
constraint file for the project, and click next.
4. In the Project Settings page, set the device and project properties, and click
Next
5. In the Project Summary page, review the information, and click Finish to
create the project
Page 70
DEPT OF ECE GIST
You can create a copy of a project to experiment with different source options
and implementations. Depending on your needs, the design source files for the copied
project and their location can vary as follows:
o Design source files are left in their existing location , and the copied project
points to these files.
o Design source files, including generated files, are copied and placed in a specified
directory.
o Design source files, excluding generated files, are copied and placed in a specified
directory.
Copied projects are the same as other projects in both form and function. For
example, you can do the following with copied projects:
∑ Open the copied project using the File > Open Project menu command.
∑ View, modify, and implement the copied project.
∑ Use the Project Browser to view key summary data for the copied project and then,
open the copied project for further analysis and implementation, as described in
Alternatively, you can create an archive of your project, which puts all of the
project contents into a ZIP file. Archived projects must be unzipped before being
opened in Project Navigator. For information on archiving, see Creating a Project
Archive.
2. In the Copy Project dialog box, enter the Name for the copy.
Note The name for the copy can be the same as the name for the project, as long as
you specifies a different location.
Page 71
DEPT OF ECE GIST
By default, this is blank, and the working directory is the same as the project
directory. However, you can specify a working directory if you want to keep your
ISE® project file (.xise extension) separate from your working area.
The description can be useful in identifying key traits of the project for reference
later.
∑ Keep sources in their current locations - to leave the design source files in their
existing location.
If you select this option, the copied project points to the files in their existing
location. If you edit the files in the copied project, the changes also appear in the
original project, because the source files are shared between the two projects.
∑ Copy sources to the new location - to make a copy of all the design source files
and place them in the specified Location directory.
If you select this option, the copied project points to the files in the specified
directory. If you edit the files in the copied project, the changes do not appear in the
original project, because the source files are not shared between the two projects.
Optionally, select Copy files from Macro Search Path directories to copy
files from the directories you specify in the Macro Search Path property in the
Translate dialog box. All files from the specified directories are copied, not just the
files used by the design.
Note: If you added a net list source file directly to the project as described in
Working with Net list-Based IP, the file is automatically copied as part of
Copy Project because it is a project source file. Adding net list source files to the
project is the preferred method for incorporating net list modules into your design,
because the files are managed automatically by Project Navigator.
Optionally, click Copy Additional Files to copy files that were not included
in the original project. In the Copy Additional Files dialog box, use the Add Files and
Remove Files buttons to update the list of additional files to copy. Additional files are
copied to the copied project location after all other files are copied .To exclude
generated files from the copy, such as implementation results and reports, select
Page 72
DEPT OF ECE GIST
Exclude Generated Files From The Copy When you select this option, the copied
project opens in a state in which processes have not yet been run.
7. To automatically open the copy after creating it, select Open the copied project.
Note By default, this option is disabled. If you leave this option disabled, the original
project remains open after the copy is made.
Click OK.
To Archive A Project:
2. In the Project Archive dialog box, specify a file name and directory for the
ZIP file.
4. Click OK.
A ZIP file is created in the specified directory. To open the archived project,
you must first unzip the ZIP file, and then, you can open the project.
Note Sources that reside outside of the project directory are copied into a remote
sources subdirectory in the project archive.
Page 73
DEPT OF ECE GIST
CHAPTER 6
SIMULATION RESULTS
RTL SCHEMATIC
Page 74
DEPT OF ECE GIST
TECHNOLOGY SCHEMATIC
Page 75
DEPT OF ECE GIST
DESIGN SUMMARY
SYNTHESIS REPORT
TABLE OF CONTENTS
2) HDL Compilation
Page 76
DEPT OF ECE GIST
4) HDL Analysis
5) HDL Synthesis
8) Partition Report
9) Final Report
=========================================================================
=========================================================================
Safe Implementation : No
:
FSM Style LUT
Page 77
DEPT OF ECE GIST
Optimization Effort :1
Keep Hierarchy : No
Page 78
DEPT OF ECE GIST
Hierarchy Separator :/
=======================================================================
=========================================================================
* HDL Compilation *
=========================================================================
No errors in compilation
Page 79
DEPT OF ECE GIST
Design Hierarchy
* Analysis *
=========================================================================
* HDL Analysis *
=========================================================================
Page 80
DEPT OF ECE GIST
* HDL Synthesis *
=========================================================================
WARNING:Xst:737 - Found 1-bit latch for signal <miss>. Latches may be generated from incomplete case or if
statements. We do not recommend the use of latches in FPGA/CPLD designs, as they may lead to timing
problems.
INFO:Xst:2371 - HDL ADVISOR - Logic functions respectively driving the data and gate enable inputs of this
latch share common terms. This situation will potentially lead to setup/hold violations and, as a result, to
simulation problems. This situation may come from an incomplete case statement (all selector values are not
covered). You should carefully review if it was in your intentions to describe such a latch.
WARNING:Xst:737 - Found 1-bit latch for signal <match>. Latches may be generated from incomplete case or
if statements. We do not recommend the use of latches in FPGA/CPLD designs, as they may lead to timing
problems.
INFO:Xst:2371 - HDL ADVISOR - Logic functions respectively driving the data and gate enable inputs of this
latch share common terms. This situation will potentially lead to setup/hold violations and, as a result, to
simulation problems. This situation may come from an incomplete case statement (all selector values are not
covered). You should carefully review if it was in your intentions to describe such a latch.
WARNING:Xst:737 - Found 1-bit latch for signal <fault>. Latches may be generated from incomplete case or if
statements. We do not recommend the use of latches in FPGA/CPLD designs, as they may lead to timing
problems.
INFO:Xst:2371 - HDL ADVISOR - Logic functions respectively driving the data and gate enable inputs of this
latch share common terms. This situation will potentially lead to setup/hold violations and, as a result, to
simulation problems. This situation may come from an
Page 81
DEPT OF ECE GIST
incomplete case statement (all selector values are not covered). You should carefully review
if it was in your intentions to describe such a latch.
==================================================================
=======
Macro Statistics
# Latches :3
1-bit latch :3
# Xors : 20
1-bit xor2 : 20
==================================================================
=======
==================================================================
=======
==================================================================
=======
==================================================================
=======
Macro Statistics
# Latches :3
1-bit latch :3
# Xors : 20
1-bit xor2 : 20
Page 82
DEPT OF ECE GIST
===========================================================
==============
===========================================================
==============
Found no macro
===========================================================
==============
===========================================================
==============
* Partition Report *
===========================================================
==============
-------------------------------
No Partitions were found in this design.
Page 83
DEPT OF ECE GIST
=========================================================================
* Final Report *
=========================================================================
Final Results
:
RTL Top Level Output File proposedcodewords.ng
Name r
:
Top Level Output File proposedcodewor
Name ds
Keep Hierarchy : No
Design Statistics
# IOs : 16
Cell Usage :
# BELS : 24
# GND :1
# LUT2 :6
# LUT3 :3
# LUT4 : 14
#
FlipFlops/Latche
s :3
# LD :3
# IO Buffers : 16
# IBUF : 13
# OBUF :3
=========================================================================
---------------------------
Number of IOs: 16
Page 84
DEPT OF ECE GIST
---------------------------
---------------------------
--------------------------
========================================================================
=
TIMING REPORT
Clock Information:
------------------
--------------------------- +------------------ -------
---------- ------+ +
| Clock buffer(FF |
Clock Signal name) Load |
--------------------------- +------------------+----
---------- --------- +
u23/miss_not0001(u23/miss_not00011:O)|
NONE(*)(u23/miss) |3|
--------------------------- +------------------+----
---------- --------- +
(*) This 1 clock signal(s) are generated by
combinatorial logic,
and XST is not able to identify which are the
primary clock signals.
Please use the CLOCK_SIGNAL constraint to specify the clock signal(s) generated by combinatorial
logic.
INFO:Xst:2169 - HDL ADVISOR - Some clock signals were not automatically buffered by XST with
BUFG/BUFR resources. Please use the buffer_type constraint in order to insert these buffers to the
clock signals to help prevent skew problems.
----------------------------------------
Page 85
DEPT OF ECE GIST
Timing Summary:
---------------
Speed Grade: -5
Timing Detail:
--------------
All values displayed in nanoseconds (ns)
=========================================================================
-------------------------------------------------------------------------
Gate Net
Cell:in-
>out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- ------------
IBUF:I-
>O 2 1.106 0.532 k_2_IBUF (k_2_IBUF)
LUT2:I0-
>O 4 0.612 0.651 u3/Mxor_y_Result1 (w3)
LUT4:I0-
>O 2 0.612 0.532 u18/Mxor_sum_Result1 (w25)
LUT3:I0-
>O 2 0.612 0.449 u21/carry1 (t4)
LUT4:I1-
>O 3 0.612 0.603 u23/match_mux000011 (u23/N2)
LUT3:I0- u23/match_mux00002
>O 1 0.612 0.000 (u23/match_mux0000)
Page 86
DEPT OF ECE GIST
=========================================================================
-------------------------------------------------------------------------
Gate Net
---------------------------------------- ------------
----------------------------------------
=========================================================================
-->
Page 87
DEPT OF ECE GIST
Number of 0
errors : 0 ( filtered)
Number of
warnings : 3 ( 0 filtered)
Number of 0
infos : 4 ( filtered)
SIMULATION RESULTS
Page 88
DEPT OF ECE GIST
FUTURE SCOPE
In addition, an efficient processing architecture has been presented to further
minimize the latency and complexity. the proposed architecture is effective in
reducing the latency as well as the complexity considerably, it can be regarded as a
promising solution for the comparison of ECC-protected data. The scope of this
project we formulate the DMC technique to assure the consistency in memory to
reduce latency and complexity.
CONCLUSION
Page 89
DEPT OF ECE GIST
REFERENCES
[1] J.D. Warnock, Y.H. Chan, S. M.Carey, H.Wen, P. J. Meaney, G.Gerwig,
H.H.Smith, Y.H.Chan, J. Davis, P. Bunce, A.Pelella, D.Rodko, P.Patel, T.Strach,
D.Malone, F. Malgioglio, J. Neves, D. L. Rude, and W. V. Huott ―Circuit and
physical design implementation of the microprocessor chip for the zEnterprise
system,‖ IEEE J. Solid-State Circuits, vol. 47, no. 1, pp. 151– 163, Jan. 2012.
[2] B.Y Kong, Jihyuck Jo, HyewonJeong, Mina Hwang, Soyoung Cha, Bongjin
Kim, and In-Cheol Park, ―LowComplexity Low-Latency Architecture for Matching of
Data Encoded With Hard Systematic Error-Correcting Codes,‖ IEEE Trans. Very
Large Scale Integr.(VLSI) Syst., vol. 22, no. 7, pp. 1648 - 1652, July. 2014.
[4] AMD Inc., Sunnyvale, CA, ―Family 10h AMD Opteron™ Processor Product
Data Sheet,‖ PID: 40036 Rev: 3.04, 2010.
Available:http://support.amd.com/us/Processor_TechDocs/40036.pdf [Online]
[5] W.Wu, D. Somasekhar, and S.-L. Lu, ―Direct compare of information coded
with error-correcting codes,‖ IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol.
20, no. 11, pp. 2147–2151, Nov. 2012.
Page 90
DEPT OF ECE GIST
APPENDIX A
SOURCE CODE
moduletb_bwa_g;
// Inputs
reg [3:0] a;
reg [3:0] b;
// Outputs wire
[7:0] out;
initial begin
// Initialize Inputs a =
4'hf;
b = 4'hf;
#100;
a = 4'h8;
b = 4'ha;
end
endmodule
moduletb_bwa_n;
// Inputs
reg [3:0] a;
reg [3:0] b;
// Outputs wire
[2:0] out;
initial begin
// Initialize Inputs a =
4'hf;
b = 4'hf;
#100;
a = 4'h8;
b = 4'ha;
Page 91
DEPT OF ECE GIST
end
endmodule
module tb_code8_4;
// Inputs
reg [3:0] a;
reg [3:0] b;
// Outputs wire
[5:0] out;
initial begin
a = 4'hf; b =
4'hf; #100;
a = 4'h8;
b = 4'ha;
end
endmodule
moduletb_decicion;
// Inputs
reg a;
reg b;
reg c;
reg d;
reg e;
reg f;
reg en;
// Outputs
wire miss;
wire fault;
wire match;
Page 92
DEPT OF ECE GIST
.miss(miss),
.fault(fault),
.match(match),
.en(en)
);
integer i;
initial begin
en = 1;
#20;
en=0;
end
initial begin
{a,b,c}=3'b000;
#350;
{a,b,c}=3'b111;
end
initial begin
for(i=0;i<16;i=i+1)
begin
#20;
{d,e,f}=i;
#20;
end
end
endmodule
moduletb_proposedcodewords;
// Inputs
reg [7:0] n;
reg [3:0] k;
reg en;
// Outputs
wiremiss,match,fault;
initial begin
en=1'b1; #20;
en=1'b0;
end
initial
begin
n=8'hff;
k=4'hf;
Page 93
DEPT OF ECE GIST
#100;
n=8'hd9;
k=4'h7;
#100;
n=8'h00;
k=4'h5;
end
endmodule
moduletb_sa_arch;
// Inputs
reg [7:0] a;
reg [7:0] b;
// Outputs
wire sum;
initial begin
// Initialize Inputs a =
8'h12;
b = 8'h34; #100;
a = 8'hab; b =
8'hcd; #100;
a = 8'hff; b =
8'hff;
end
endmodule
modulexor_gate(a,
b,y); input a,b;
output y;
Page 94
DEPT OF ECE GIST
assign y=a^b;
endmodule
modulehalf_adder(a,b,sum,carry);
input a,b;
outputsum,carry;
assign sum=a^b;
assign carry=a&b;
endmodule
module decision(a,b,c,d,e,f,miss,fault,match,en);
input a,b,c,d,e,f;
outputregmiss,match,fault;
input en;
always@(*)
begin
if(en)
begin
miss=0;
match=0;
fault=0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b000) begin
match=1'b1;
miss=1'b0;
fault=1'b0; end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b001) begin
match=1'b1;
miss=1'b0;
fault=1'b0; end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b010) begin
match=1'b0;
miss=1'b0;
fault=1'b1; end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b011) begin
match=1'b0;
miss=1'b0;
fault=1'b1; end
Page 95
DEPT OF ECE GIST
endmodule
moduleor_gate(a,b
,y); input a,b;
output y;
assign
y=a||b;
endmodule
modulesa(a,b,s
um); input
a,b;
output
sum;
assign
sum=a^b;
endmodule
modulebwa_g(a,b,o
ut); input
[3:0]a,b; output
[7:0]out;
Page 96
DEPT OF ECE GIST
wire w1,w2,w3,w4,w5,w6,w7,w8,w9,
w10,w11,w12,w13,w14,w15,w16;
half_adder u1(.a(a[0]),.b(b[0]),.sum(w1),.carry(w2));
half_adder u2(.a(a[1]),.b(b[1]),.sum(w3),.carry(w4));
half_adder u3(.a(a[2]),.b(b[2]),.sum(w5),.carry(w6));
half_adder u4(.a(a[3]),.b(b[3]),.sum(w7),.carry(w8));
half_adder u5(.a(w1),.b(w3),.sum(w9),.carry(w10));
half_adder u6(.a(w5),.b(w7),.sum(w11),.carry(w12));
half_adder u7(.a(w2),.b(w4),.sum(w13),.carry(w14));
half_adder u8(.a(w6),.b(w8),.sum(w15),.carry(w16));
half_adder u9(.a(w9),.b(w11),.sum(out[0]),.carry(out[1]));
half_adder u10(.a(w13),.b(w15),.sum(out[2]),.carry(out[3]));
half_adder u11(.a(w10),.b(w12),.sum(out[4]),.carry(out[5]));
half_adder u12(.a(w14),.b(w16),.sum(out[6]),.carry(out[7]));
endmodule
module code8_4(a,b,out);
input [3:0]a,b;
output [5:0]out;
wire w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,
w12,w13,w14,w15,w16,w17,w18,w19,w20;
half_adder u1(.a(a[0]),.b(b[0]),.sum(w1),.carry(w2));
half_adder u2(.a(a[1]),.b(b[1]),.sum(w3),.carry(w4));
half_adder u3(.a(a[2]),.b(b[2]),.sum(w5),.carry(w6));
half_adder u4(.a(a[3]),.b(b[3]),.sum(w7),.carry(w8));
half_adder u5(.a(w1),.b(w3),.sum(w9),.carry(w10));
half_adder u6(.a(w5),.b(w7),.sum(w11),.carry(w12));
half_adder u7(.a(w2),.b(w4),.sum(w13),.carry(w14));
half_adder u8(.a(w6),.b(w8),.sum(w15),.carry(w16));
half_adder u9(.a(w9),.b(w13),.sum(out[0]),.carry(out[1]));
half_adder u10(.a(w10),.b(w11),.sum(w17),.carry(w18));
half_adder u11(.a(w14),.b(w15),.sum(w19),.carry(w20));
or_gate u12(.a(w12),.b(w16),.y(out[5]));
half_adder u13(.a(w17),.b(w19),.sum(out[2]),.carry(out[3]));
or_gate u14(.a(w18),.b(w20),.y(out[4]));
endmodule
module decision(a,b,c,d,e,f,miss,fault,match,en);
input a,b,c,d,e,f;
outputregmiss,match,fault;
input en;
always@(*)
begin
if(en)
begin
miss=0;
match=0;
fault=0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b000) begin
Page 97
DEPT OF ECE GIST
match=1'b1;
miss=1'b0;
fault=1'b0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b001) begin
match=1'b1;
miss=1'b0;
fault=1'b0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b010) begin
match=1'b0;
miss=1'b0;
fault=1'b1;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b011) begin
match=1'b0;
miss=1'b0;
fault=1'b1;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b100) begin
match=1'b0;
miss=1'b0;
fault=1'b1;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b101) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b110) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
else if({a,b,c}==3'b0 &
{d,e,f}==3'b111) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
else if({a,b,c}==3'b1 &
{d,e,f}=={0,1,2,3,4,5,6,7}) begin
match=1'b0;
miss=1'b1;
fault=1'b0;
end
end
endmodule
input
[3:0]a,b;
output [7:0]out;
Page 98
DEPT OF ECE GIST
wire w1,w2,w3,w4,w5,w6,w7,w8,w9,
w10,w11,w12,w13,w14,w15,w16;
half_adder u1(.a(a[0]),.b(b[0]),.sum(w1),.carry(w2));
half_adder u2(.a(a[1]),.b(b[1]),.sum(w3),.carry(w4));
half_adder u3(.a(a[2]),.b(b[2]),.sum(w5),.carry(w6));
half_adder u4(.a(a[3]),.b(b[3]),.sum(w7),.carry(w8));
half_adder u5(.a(w1),.b(w3),.sum(w9),.carry(w10));
half_adder u6(.a(w5),.b(w7),.sum(w11),.carry(w12));
half_adder u7(.a(w2),.b(w4),.sum(w13),.carry(w14));
half_adder u8(.a(w6),.b(w8),.sum(w15),.carry(w16));
half_adder u9(.a(w9),.b(w11),.sum(out[0]),.carry(out[1]));
half_adder u10(.a(w13),.b(w15),.sum(out[2]),.carry(out[3]));
half_adder u11(.a(w10),.b(w12),.sum(out[4]),.carry(out[5]));
half_adder u12(.a(w14),.b(w16),.sum(out[6]),.carry(out[7]));
endmodule
module bwa_n(a,b,out);
input [3:0]a,b; output
[2:0]out;
wire w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w12,w13,w14,w15,w16;
half_adder u1(.a(a[0]),.b(b[0]),.sum(w1),.carry(w2));
half_adder u2(.a(a[1]),.b(b[1]),.sum(w3),.carry(w4));
half_adder u3(.a(a[2]),.b(b[2]),.sum(w5),.carry(w6));
half_adder u4(.a(a[3]),.b(b[3]),.sum(w7),.carry(w8));
half_adder u5(.a(w1),.b(w3),.sum(w9),.carry(w10));
half_adder u6(.a(w5),.b(w7),.sum(w11),.carry(w12));
or_gate u7(.a(w2),.b(w4),.y(w13));
or_gateu8(.a(w6),.b(w8),.y(w14));
half_adder u9(.a(w9),.b(w11),.sum(out[0]),.carry(out[1]));
or_gate u10(.a(w10),.b(w12),.y(w15));
or_gate u11(.a(w13),.b(w14),.y(w16));
or_gate u12(.a(w15),.b(w16),.y(out[2]));
endmodule
moduleor_gate(a,b,y);
input a,b;
output y;
assign y=a||b;
endmodule
moduleproposedcodewords(n,k,miss,match,fault,en);
input [7:0]n;
input en;
input [3:0]k;
outputmiss,match,fault;
wire w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,
w11,w12,w13,w14,w15,w16,w17,w18,
w19,w20,w21,w22,w23,w24,w25,w26,w27,w28;
wire t1,t2,t3,t4,t5,t6;
Page 99
DEPT OF ECE GIST
xor_gate
u1(.a(n[0]),.b(k[0]),.y(w1));
xor_gate
u2(.a(n[1]),.b(k[1]),.y(w2));
xor_gate
u3(.a(n[2]),.b(k[2]),.y(w3));
xor_gate
u4(.a(n[3]),.b(k[3]),.y(w4));
xor_gate
u5(.a(n[4]),.b(k[0]),.y(w5));
xor_gate
u6(.a(n[5]),.b(k[1]),.y(w6));
xor_gate
u7(.a(n[6]),.b(k[2]),.y(w7));
xor_gate
u8(.a(n[7]),.b(k[3]),.y(w8));
half_adder
u9(.a(w1),.b(w2),.sum(w9),.carry(w10));
half_adder
u10(.a(w3),.b(w4),.sum(w11),.carry(w12));
half_adder
u11(.a(w5),.b(w6),.sum(w13),.carry(w14));
half_adder
u12(.a(w7),.b(w8),.sum(w15),.carry(w16));
half_adder
u13(.a(w9),.b(w11),.sum(w17),.carry(w18));
half_adder
u14(.a(w10),.b(w12),.sum(w19),.carry(w20));
half_adder
u15(.a(w13),.b(w15),.sum(w21),.carry(w22));
half_adder
u16(.a(w14),.b(w16),.sum(w23),.carry(w24));
half_adder
u17(.a(w17),.b(w21),.sum(t1),.carry(t2));
half_adder
u18(.a(w18),.b(w19),.sum(w25),.carry(w26));
half_adder
u19(.a(w22),.b(w23),.sum(w27),.carry(w28));
or_gate u206(.a(w20),.b(w24),.y(t6));
half_adder
u21(.a(w25),.b(w27),.sum(t3),.carry(t4));
or_gate u22(.a(w26),.b(w28),.y(t5));
decision u23(.a(t6),.b(t5),.c(t4),.d(t3),.e(t2),.f(t1),.en(en),
.miss(miss),.match(match),.fault(fault));
endmodule
modulesa_arch(a,b,su
m); input [7:0]a,b;
output sum;
Page 100
wire
w1,w2,w3,w4,w5,w
6,w7,w8,w9,
w10,w11,w12,w13,
w14;
xor_gate
u1(.a(a[0]),.b(b[0]),.y(w1));
xor_gate
u2(.a(a[1]),.b(b[1]),.y(w2));
xor_gate
u3(.a(a[2]),.b(b[2]),.y(w3));
xor_gate
u4(.a(a[3]),.b(b[3]),.y(w4));
xor_gate
u5(.a(a[4]),.b(b[4]),.y(w5));
xor_gate
u6(.a(a[5]),.b(b[5]),.y(w6));
xor_gate
u7(.a(a[6]),.b(b[6]),.y(w7));
xor_gate
u8(.a(a[7]),.b(b[7]),.y(w8));
ha
u9(.a(w1),.b(w2),.sum(w9));
ha
u10(.a(w3),.b(w4),.sum(w10))
; ha
u11(.a(w5),.b(w6),.sum(w11))
; ha
u12(.a(w7),.b(w8),.sum(w12))
;
sa
u13(.a(w9),.b(w10),.sum(w13));
sa
u14(.a(w11),.b(w12),.sum(w14))
; sa
u15(.a(w13),.b(w14),.sum(sum))
;
endmodule
Page 101
APPENDIX B
IEEE BASE PAPER