Professional Documents
Culture Documents
This paper presents an adaptive encoding framework for the reduction of transition activity in
high-capacitance off-chip data buses, since power dissipation associated with those buses can be
significant for high-speed communication. The technique relies on the observation of data characteristics
over fixed window sizes and formation of cluster with bit lines having highly correlated switching
patterns. The proposed method utilizes redundancy in space and time to prevent loss of information
while retrieving data. We present analytical and experimental analyses, which demonstrate the activity
reduction of our encoding scheme for various data. The extra power cost due to the encoder and decoder
circuitry along with redundancy is offset due to reduced number of off-chip transitions.
Introduction:
As CMOS technology progresses into nanometer and sub-nanometer technology, it poses many
challenges to design and test engineers. The scaling of VLSI integrated circuits has increased the
sensitivity of CMOS technology to cause large power dissipation, propagation delays and various noise
mechanisms such as power supply noise, crosstalk noise, leakage noise, etc. The power consumption
and crosstalk has become a major concern because of continuing decrease in the minimum feature size
and the corresponding increase in chip density and operating frequencies. Most of the power is being
wasted on the data buses and long interconnects as dynamic power dissipation for charging and
discharging of internal node capacitances and inter-wire capacitances.
In Non Deep submicron technology the load capacitance or the substrate capacitance (C L) between wires
to substrate is dominating factor. The coupling capacitance (CC) is between parallel wires which is
negligible compare to the load capacitance. Unfortunately in nanometer and sub nanometer technologies
the coupling capacitance dominates the load capacitance and its magnitude is several times larger than
load capacitance. The characteristics of data buses and long interconnects such as wire spacing [9], wire
length, wire material, wire width, driver strength, coupling length and signal transition time, etc.
influences the coupling effect. This increased coupling effect on on-chip buses and on long
interconnects not only increase the power dissipation but also deteriorate the signal integrity due to the
coupling capacitance. As a results these busses and interconnects becoming more sensitive and prone to
errors caused by crosstalk and delay faults [17], [18], [19].Reducing the power dissipating transitions
can also reduces the crosstalk and delay faults [12], [13].The coupling capacitance also depends upon
the data dependent transitions and the coupling effect will increase or decrease depending upon the
relative switching activity between adjacent bus wires [14].
Off-chip data buses play an important role in reliable communication and high-performance chips.
Power is consumed because of charging and discharging of a coupling and load capacitance due to
transition of a signal on data bus. Reducing the transition activity or switching activity on the on-chip
data buses is the one of the attractive way of reducing power dissipation. Switching activity on the data
bus can be reduced by employing bus encoding techniques. Several bus encoding techniques have been
proposed to reduce power consumption during bus transmission in literature. These techniques mainly
relay on reducing the data bus activity by reducing the self transitions or reducing the coupling
transitions. Reducing power dissipating transition by encoding the data on the data buses leads to
reducing the bus activity hence overall power consumption is reduced.
Over the past few years, a number of coding techniques have been proposed for reducing the
transitions on a data bus. For data buses, one popular coding scheme is the bus invert coding technique
proposed by Stan and Burleson [1]. In this method it compares the successive data bus values and
determining if inverting a data on data bus word would results in fewer bit transitions than not inverting
the word. This technique is suitable for uncorrelated data patterns and is based on the Hamming
distance. It calculates the number of bits that change state from one data word to next data word. If the
number of bit transitions is greater than the half of the total number of bus lines, then the inverted data is
transmitted over the bus, other wise original data is sent. This method effectively reduces the maximum
number of transitions to one half of the number of data bits. Other variants of the bus invert coding
schemes include a decomposition approach [5] and partial bus coding technique [6].Both these
techniques have an area overhead to determine the suitable partition of the data bus. In addition, the
decomposition approach [5] can require up to p-1 extra lines on the bus where p is the number of
partitions of the original data bus. The energy dissipated due to coupling capacitance is analyzed in [7],
[8]. For instruction buses Gray code [2], T0 code [3], the Beach code [4] have been proposed which
reduces the transitions there by reducing the power dissipation. Dynamic coding method takes even and
odd line as bus sub-group and finds the coupling transitions and then invert the sub-bus group which
decreases the coupling transitions [11]. Bus regrouping method divides the bus into small sub-groups
and then regroups by taking bits from different subgroups [15]. In Novel Coding Technique the data bus
is sub divided into even and odd bit groups. Hamming distance between even sub group, odd sub group,
inverted data and present data is compared with the data on the bus respectively. The sub-group whose
hamming distance is lesser that subgroup’s data is inverted and transmitted with two redundant control
bits for decodes purpose. One of the disadvantage of this technique is coupling transitions occurs due to
redundant bits also [16]. In almost all above mentions methods only coupling transitions are considered
and self transitions either neglected or cannot be reduced. The proposed method by using Bus
regrouping with Hamming distance considers the reduction of both coupling as well as self transition
which results to a more save in power consumption.
Switching activity:
Typical pipelined cir-cuit in which each stage consists of a combinational circuit between two latches.
At the beginning of a clock cycle, the processor first latches the input signals of the combinational
circuit in latch A. It then evaluates those signals in the combinational circuit and propagates them to
output. Output is latched in latch B during the next cycle and becomes the input signals for the next
pipeline stage. The switching activity of combinational circuits depends on the logic and structure of the
circuit and the switching at the output of the input latch. There is no general theory about the
relationship between the switching activity of the inputs and that of the internal nodes of a
combinational circuit. We believe that if switching is high at the inputs, the internal switching at the
combinational circuit also tends to be high and vice versa. Different instruction sequences can have
significantly different effects on the switching activity; the impact depends on the architecture. In a
CISC processor, the impact is not obvious since one instruction may need several cycles to execute. In
contrast, for a pipelined RISC-like embedded processor, most instructions execute in one cycle, and the
impact can be significant since the instruction scheduler schedules more instructions. To better
understand the impact of instruction sequence on switching activity, we selected RISC-like pipelined
processor VLSI-BAM’ (VLSI-Berkeley Abstract Machine) as an experimental architecture. This
processor is pipelined with datastationary control, that is, each pipeline stage has separate controls
Instruction bfetch, instruction decode, instruction vbexecution, memory access, and write back. The
instruction set of the VUI-BAM is similar to the Mips 20OO2 with extensions for symbolic
computation. Figure 2 shows the pipeline stages and the control path of the VLSI-BAM. Each pipeline
stage has an instruction register, programmable logic array (PLA), and latch for control signals. The
processor passes instructions through instruction registers, and the PLA decodes them in each stage. The
procesr sor then generates control signals from PLAs and usually latches them before sending them
down the data path. p i s may not always be true in an actual VUI-BAM implementation.) We built a
cyclebycycle instructionlevel simulator for collecting the switching at the latches in the control path
during execution of benchmark programs. The benchmarks, shown in Table 1, all come from the
Aquariu~suitea~nd we first put them through the Aquarius Prolog ~ompi l e r .T~h,e~ c ompiler
produces an intermediate BAM code that is target machine independent. We then further compile the
BAM code into code for target machine VISI-BAM.
Gray code addressing
Pipelined embedded processors produce an instruction address during each cycle by selecting the
address that the counter or address adder generates. Due to instruction locality during program
execution, the processor accesses instructions sequentially most of the time. Gray code (see What is
Gray code box, next page) characteristically changes by only one bit as it sequences from one number to
the next. Thus Gray code has an advantage over straight binary code since each memory access changes
the address by only one bit. Therefore we can eliminate a significant number of bit switches using Gray
code addressing.
From the previous logics of FMO/Manchester with SLOS was taken as same architecture for this we add
a Miller Encoding by adding another MUX to it. SO the user can select the type of encoding he wants
with the Two selection lines. The selection truth table was shown below
Chapter-2
Literature survey:
The radio frequency identification system (RFID) is becoming one of the most
popular system in wireless technologies. The UHF RFID tag emulator is a part
of RFID testing tools. The UHF RFID tagemulator would be imitating the behavior
of RFID Tag. The UHF RFID tag emulator (860 MHz to 960 MHz) is aimed for testing
the RFID systems and also acts as a general-purpose data transport device for
other RFID systems. The tag emulator belongs to the EPC class-III (semi-passive) tags,
but it implements the Class-1Generation II (C1G2) air interface protocol for
communicating with the reader. In this work, we have presented RTL design
of Manchester encoder. As motivated by Finite State Machine (FSM) and RTL
implementations of encoder are discussed with particular focus to use
theRFID Emulator as data transport device and debugging tool. The synthesis result
shows that FSMdesign is efficient (less area and high speed) and it operates at a
maximum frequency of 256.54 MHz.
7. Top down design of joint MODEM and CODEC detection schemes for DSRC
coded-FSK systems over high mobility fading channels:
The joint detection and verification of frequency shift keying (FSK) modulation
and demodulation (MODEM), Manchester coding and decoding (CODEC) schemes are
proposed for dedicated short range communication (DSRC) systems over high mobility
fading channels. The proposed joint coded-FSK detection scheme with low complexity
benefit can outperform the conventional separated coded-FSK detection scheme. It is due
to the joint scheme with time diversity gain to enhance the detectionperformance.
Moreover, the proposed joint algorithms with floating-point and fixed-point designs are
verified in the software-defined-ratio (SDR) platform. Based on the measurement results
via SDR equipments, it is confirmed that the implementation of VHDL hardware
circuit design of the proposedjoint detection scheme can provide robust performance over
high mobility Rician multipath fading channel environment.
The design of VLSI circuits today has become very challenging indeed. The main
factor affecting system performance is the interconnect delay. Many algorithms have
been proposed to solve the interconnect timing optimization problem. Research has
shown that techniques like buffer insertion and wire-sizing have been proven to be very
effective in reducing interconnect delay. This paper describes a graph-based routing
algorithm to solve the interconnect delay optimization problem in a deep submicron
VLSI layout routing. The algorithm finds the optimal delay routing paths
with simultaneous consideration of buffer insertions and wire-sizing, while taking into
account wire or buffer obstacles. The proposed algorithm, called S-RABILA
(Simultaneous Routing and Buffer Insertion with Look-Ahead), utilizes a novel look-
ahead technique that significantly contributes to the computational efficiency of the
proposed algorithm. In this paper, the performance of S-RABILA is presented, which
shows the effectiveness of the look-ahead scheme. Experimental results also indicate that
the proposed algorithm provide significant improvements over similar existing VLSI
routing algorithms.
Chapter 3
VLSI DESIGN:
The original business plan was to be a contract wafer fabrication company, but the
venture investors wanted the company to develop IC (Integrated Circuit) design tools to
help fill the foundry. Thanks to its Caltech and UC Berkeley students, VLSI was an
important pioneer in the electronic design automation (EDA) industry. It offered a
sophisticated package of tools, originally based on the 'lambda-based' design style
advocated by Carver Mead and Lynn Conway. VLSI became an early vendor of standard
cell (cell-based technology) to the merchant market in the early 1980s where the other
ASIC-focused company, LSI Logic, was a leader in gate arrays. Prior to VLSI's cell-
based offering, the technology had been primarily available only within large vertically
integrated companies with semiconductor units such as AT&T and IBM. VLSI's design
tools included not only design entry and simulation but eventually also cell-based routing
(chip compiler), a datapath compiler, SRAM and ROM compilers, and a state machine
compiler. The tools were an integrated design solution for IC design and not just point
tools, or more general purpose system tools. A designer could edit transistor-level
polygons and/or logic schematics, then run DRC and LVS, extract parasitics from the
layout and run Spice simulation, then back-annotate the timing or gate size changes into
the logic schematic database. Characterization tools were integrated to generate
FrameMaker Data Sheets for Libraries. VLSI eventually spun off the CAD and Library
operation into Compass Design Automation but it never reached IPO before it was
purchased by Avanti Corp. VLSI's physical design tools were critical not only to its ASIC
business, but also in setting the bar for the commercial electronic design
automation (EDA) industry. When VLSI and its main ASIC competitor, LSI Logic, were
establishing the ASIC industry, commercially-available tools could not deliver the
productivity necessary to support the physical design of hundreds of ASIC designs each
year without the deployment of a substantial number of layout engineers. The companies'
development of automated layout tools was a rational "make because there's nothing to
buy" decision. The EDA industry finally caught up in the late 1980s when Tangent
Systems released its TanCell and TanGate products. In 1989, Tangent was acquired by
Cadence Design Systems (founded in 1988).
Unfortunately, for all VLSI's initial competence in design tools, they were not
leaders in semiconductor manufacturing technology. VLSI had not been timely in
developing a 1.0 µm manufacturing process as the rest of the industry moved to that
geometry in the late 1980s. VLSI entered a long-term technology parthership
with Hitachi and finally released a 1.0 µm process and cell library (actually more of a
1.2 µm library with a 1.0 µm gate). As VLSI struggled to gain parity with the rest of the
industry in semiconductor technology, the design flow was moving rapidly to a Verilog
HDL and synthesis flow. Cadence acquired Gateway, the leader in Verilog hardware
design language (HDL) and Synopsys was dominating the exploding field of design
synthesis. As VLSI's tools were being eclipsed, VLSI waited too long to open the tools
up to other fabs and Compass Design Automation was never a viable competitor to
industry leaders. Meanwhile, VLSI entered the merchant high speed static RAM (SRAM)
market as they needed a product to drive the semiconductor process technology
development. All the large semiconductor companies built high speed SRAMs with cost
structures VLSI could never match. VLSI withdrew once it was clear that the Hitachi
process technology partnership was working. ARM Ltd was formed in 1990 as a
semiconductor intellectual property licensor, backed by Acorn, Apple and VLSI. VLSI
became a licensee of the powerful ARM processor and ARM finally funded processor
tools. Initial adoption of the ARM processor was slow. Few applications could justify the
overhead of an embedded 32-bit processor. In fact, despite the addition of further
licensees, the ARM processor enjoyed little market success until they developed the
novel 'thumb' extensions. Ericsson adopted the ARM processor in a VLSI chipset for its
GSM handset designs in the early 1990s. It was the GSM boost that is the foundation of
ARM the company/technology that it is today. Only in PC chipsets, did VLSI dominate
in the early 1990s. This product was developed by five engineers using the 'Megacells" in
the VLSI library that led to a business unit at VLSI that almost equaled its ASIC business
in revenue. VLSI eventually ceded the market to Intel because Intel was able to package-
sell its processors, chipsets, and even board level products together. VLSI also had an
early partnership with PMC, a design group that had been nurtured of British Columbia
Bell. When PMC wanted to divest its semiconductor intellectual property venture, VLSI's
bid was beaten by a creative deal by Sierra Semiconductor. The telecom business unit
management at VLSI opted to go it alone. PMC Sierra became one of the most important
telecom ASSP vendors. Scientists and innovations from the 'design technology' part of
VLSI found their way to Cadence Design Systems (by way of Redwood Design
Automation). Compass Design Automation (VLSI's CAD and Library spin-off) was sold
to Avant! Corporation, which itself was acquired by Synopsys.
ENCODER:
Combine two or more small decoders with enable inputs to form a larger decoder e.g. 3-
to-8-line decoder constructed from two 2-to-4-line decoders.
Decoder with enable input can function as demultiplexer.
3:8 decoder
It uses all AND gates, and therefore, the outputs are active- high. For active- low outputs,
NAND gates are used. It has 3 input lines and 8 output lines. It is also called as binary to
octal decoder it takes a 3-bit binary input code and activates one of the 8(octal) outputs
corresponding to that code. The truth table is as follows:
Octal to binary encoder
Octal-to-Binary take 8 inputs and provides 3 outputs, thus doing the opposite of what the
3-to-8 decoder does. At any one time, only one input line has a value of 1. The figure
below shows the truth table of an Octal-to-binary encoder.
For an 8-to-3 binary encoder with inputs I0-I7 the logic expressions of the outputs Y0-Y2
are:
Y0 = I1 + I3 + I5 + I7
Y1= I2 + I3 + I6 + I7
Y2 = I4 + I5 + I6 +I7
Priority encoder
A priority encoder is a circuit or algorithm that compresses multiple binary inputs into a
smaller number of outputs. The output of a priority encoder is the binary representation
of the ordinal number starting from zero of the most significant input bit. They are often
used to control interrupt requests by acting on the highest priority request. It includes
priority function. If 2 or more inputs are equal to 1 at the same time, the input having the
highest priority will take precedence. Internal hardware will check this condition and
priority is set.
Table 4: Truth Table of 4 bit priority encoder/p>
Multiplexer
Demultiplexer
A demultiplexer (or demux) is a device taking a single input signal and selecting one of
many data-output-lines, which is connected to the single input. A multiplexer is often
used with a complementary demultiplexer on the receiving end. A demultiplexer is a
single-input, multiple-output switch. Demultiplexers take one data input and a number of
selection inputs, and they have several outputs. They forward the data input to one of the
outputs depending on the values of the selection inputs.
Demultiplexers are sometimes convenient for designing general purpose logic, because if
the demultiplexer's input is always true, the demultiplexer acts as a decoder. This means
that any function of the selection bits can be constructed by logically OR-ing the correct
set of outputs. Demultiplexer is called as a ‘distributro’, since it transmits the same data
to different destinations.
FLIPFLOP:
In electronics, a flip-flop or latch is a circuit that has two stable states and can be
used to store state information. A flip-flop is a bistable multivibrator. The circuit can be
made to change state by signals applied to one or more control inputs and will have one
or two outputs. It is the basic storage element insequential logic. Flip-flops and latches
are a fundamental building block of digital electronics systems used in computers,
communications, and many other types of systems. Flip-flops and latches are used as data
storage elements. A flip-flop stores a single bit (binary digit) of data; one of its two states
represents a "one" and the other represents a "zero". Such data storage can be used for
storage of state, and such a circuit is described as sequential logic. When used in a finite-
state machine, the output and next state depend not only on its current input, but also on
its current state (and hence, previous inputs). It can also be used for counting of pulses,
and for synchronizing variably-timed input signals to some reference timing signal. Flip-
flops can be either simple (transparent or opaque) or clocked (synchronous or edge-
triggered). Although the term flip-flop has historically referred generically to both simple
and clocked circuits, in modern usage it is common to reserve the term flip-
flop exclusively for discussing clocked circuits; the simple ones are commonly
called latches.[1][2] Using this terminology, a latch is level-sensitive, whereas a flip-flop is
edge-sensitive. That is, when a latch is enabled it becomes transparent, while a flip flop's
output only changes on a single type (positive going or negative going) of clock edge.
The D flip-flop captures the value of the D-input at a definite portion of the clock cycle
(such as the rising edge of the clock). That captured value becomes the Q output. At other
times, the output Q does not change.[22][23] The D flip-flop can be viewed as a memory
cell, a zero-order hold, or a delay line.[citation needed]
Truth table:
Clock D Qnext
Rising edge 0 0
Rising edge 1 1
Non-Rising X Q
Most D-type flip-flops in ICs have the capability to be forced to the set or reset state
(which ignores the D and clock inputs), much like an SR flip-flop. Usually, the illegal
S = R = 1 condition is resolved in D-type flip-flops. By setting S = R = 0, the flip-flop
can be used as described above. Here is the truth table for the others S and R possible
configurations:
Inputs Outputs
S R D > Q Q'
0 1 X X 0 1
1 0 X X 1 0
1 1 X X 1 1
These flip-flops are very useful, as they form the basis for shift registers, which are an
essential part of many electronic devices. The advantage of the D flip-flop over the D-
type "transparent latch" is that the signal on the D input pin is captured the moment the
flip-flop is clocked, and subsequent changes on the D input will be ignored until the next
clock event. An exception is that some flip-flops have a "reset" signal input, which will
reset Q (to zero), and may be either asynchronous or synchronous with the clock.
The above circuit shifts the contents of the register to the right, one bit position on each
active transition of the clock. The input X is shifted into the leftmost bit position.
This circuit[24] consists of two stages implemented by SR NAND latches. The input stage
(the two latches on the left) processes the clock and data signals to ensure correct input
signals for the output stage (the single latch on the right). If the clock is low, both the
output signals of the input stage are high regardless of the data input; the output latch is
unaffected and it stores the previous state. When the clock signal changes from low to
high, only one of the output voltages (depending on the data signal) goes low and
sets/resets the output latch: if D = 0, the lower output becomes low; if D = 1, the upper
output becomes low. If the clock signal continues staying high, the outputs keep their
states regardless of the data input and force the output latch to stay in the corresponding
state as the input logical zero (of the output stage) remains active while the clock is high.
Hence the role of the output latch is to store the data only while the clock is low.
The circuit is closely related to the gated D latch as both the circuits convert the two D
input states (0 and 1) to two input combinations (01 and 10) for the outputSR latch by
inverting the data input signal (both the circuits split the single D signal in two
complementary S and R signals). The difference is that in the gated D latch simple
NAND logical gates are used while in the positive-edge-triggered D flip-flop SR NAND
latches are used for this purpose. The role of these latches is to "lock" the active output
producing low voltage (a logical zero); thus the positive-edge-triggered D flip-flop can
also be thought of as a gated D latch with latched input gates.
A master–slave D flip-flop. It responds on the falling edge of the enable input (usually a
clock)
An implementation of a master–slave D flip-flop that is triggered on the rising edge of
the clock
For a positive-edge triggered master–slave D flip-flop, when the clock signal is low
(logical 0) the "enable" seen by the first or "master" D latch (the inverted clock signal) is
high (logical 1). This allows the "master" latch to store the input value when the clock
signal transitions from low to high. As the clock signal goes high (0 to 1) the inverted
"enable" of the first latch goes low (1 to 0) and the value seen at the input to the master
latch is "locked". Nearly simultaneously, the twice inverted "enable" of the second or
"slave" D latch transitions from low to high (0 to 1) with the clock signal. This allows the
signal captured at the rising edge of the clock by the now "locked" master latch to pass
through the "slave" latch. When the clock signal returns to low (1 to 0), the output of the
"slave" latch is "locked", and the value seen at the last rising edge of the clock is held
while the "master" latch begins to accept new values in preparation for the next rising
clock edge.
By removing the leftmost inverter in the circuit at side, a D-type flip-flop that strobes on
the falling edge of a clock signal can be obtained. This has a truth table like this:
D Q > Qnext
0 X Falling 0
1 X Falling 1
When data statistics are not known beforehand, and transitional probabilities of each bit line are
changing over time with probabilities among the bit lines varying from low to high, then the
consideration of the fixed subgroup or cluster of bit lines reduces the savings margin, since the transition
correlation changes with time. The best way to enrich the transition reduction is to extract the signal
statistics before application of encoding by observing the data over time. This ensures the establishment
of the transitional correlation among bit line adaptively and dynamic formation of cluster with high
correlated bit lines within a fixed observation window. This gives it an advantage over existing encoding
schemes which cannot efficiently handle the situations when the transmitted data characteristics change
abruptly
Proposed Encoding Scheme: Theoretical Background
The proposed approach encodes the data to minimize the self-switching activity before they are
introduced into the off-chip bus with the objective of reducing average power dissipation. The main idea
is to evaluate the switching statistics for each bit line by observing data stream over an observation
window, and to establish the transition correlation among them. The highly correlated bit lines form a
cluster, which changes across different observation windows as local switching probability changes. In
each observation window, one bit line is designated as a basis line, which has maximum correlated
switching transitions with the other lines. The lines which have maximum correlation with the basis are
clustered together. In other words, the switching transitions of all the clustered lines of the bus in that
particular observation window have maximum projection component along the switching transitions of
the selected basis line. When all the clustered lines are XOR-ed with the basis, it leads to maximum
switching savings. The clustering information is sent using temporal redundancy between two adjacent
windows, while the basis is sent as an extra line as a spatial redundancy. A sample of an encoded
observation window, with spatiotemporal redundancy, is shown in Fig. 1. The entire process can be
symbolically represented as follows.
In this section, we demonstrate statistical analysis of the proposed algorithm in a single observation
window. The same is also done for BIC and APBIC, and the advantage of our algorithm is clearly
revealed. Consider a data source which generates symbols S. At any time t, N bit word forms symbol s ∈
S, where s = [b0, b1, . . . , bN−1]. N bit wide bus carries symbols over time where each bit line exhibits
different localtransitional probabilities p ∈ P, p = [p0, p1, . . . , pN−1]. Transition probabilities of each
bit line can be computed by dividing the occurrence frequency of S01 or S10 by window size W, e.g., p0
= (S01/W)|b0. S01 or S10 defines the total number of 0 to 1 or 1 to 0 transition. In our proposed
method,expected savings from each window rely on the selection of thebasis line. Probability of being
basis line differs for differentbit lines, since expected savings contribution changes with bitlines chosen
as basis. Consider bi as basis line for a bus ofwidth N. XOR operation among basis and other bit line b
jat the final stage of our suggested encoding technique leadsto the switching reduction from b j . Table I
shows impactson b j while performing bi ⊕ b j (here ⊕ denotes XOR operation)and probabilities of
different outcomes at any switchingtime.Combined probability of no switching change on b j,
representedhere as χ5, takes the value (1 − pi ). Extension of theconcept of XOR outcome for
transitional observation windowof width W leads to the same three different scenarios withdifferent
probabilities, where decrease or increase of switchingcount can take any value between 1 and W.
Probability of knumber of switching savings in any window, defined here asχi, jW,k, from bit line b j is
expressed by
and the summationlimits are from n = 1 to ((W/2) − (k − 1/2)) if k is odd, and n = 1 to ((W/2) − (k/2) + 1)
if k is even. Equation (7) is obtained by considering all the scenarios for each savings count. For
example, when W = 8, probability of six switching savings χ i, j 8,6 in b j comes from two possible
scenarios: six switching savings or seven switching savings and 1 increase in switching transition in
eight clock steps
Adaptive encoder:
In this section, we will demonstrate possible implementation of the proposed algorithm. Fig. 4 shows basic block diagram of
proposed encoding methodology for bus of width N. It consists of decision blocks, delay elements, and set of XOR gates and a
multiplexer. Decision block consists of eliminator, cluster formation, and basis selection unit. It generates the control
information, corresponding to cluster for each observation window and the multiplexer inserts the temporal redundancy. The
sequence of operation in the decision block is shown in Fig. 5. Each element of the row evaluates the savings contribution by
each line and decides the presence of bit line in cluster as per (5) for each observation window. Savings computation unit at
the end of each row, which takes the output of bitwise savings computation units, computes overall savings for each bit line if
it was chosen as basis. It can be implemented by balanced carry save adder tree. Since basis line is an obvious candidate for
cluster and incurs no savings from itself, diagonal section of the matrix contributes no additional hardware cost. Basis
selection is nothing but an index selection unit, which compares the overall savings contribution due to selection of each bit
line as basis and finally determines the potential basis among all as per (6). Later in this section, we have given some detail
insight of bitwise saving computation unit and eliminator block.
The presence of eliminator block reduces the internal node switching of encoder. It eliminates bit lines
from cluster at early stage without computing αi, j . It also groups potential bit lines as basis among all,
which further decreases internal switching count. Fig. 6 shows the basic eliminator block when window
size for transition observation is of 16 clock cycles. The computation corresponds to the (bi , b j )
element of Fig. 5 matrix where bi is chosen as basis. The hardware implementation is optimized by
removing the bit lines with switching probability less than 0.25 from basis consideration. This
simplification is justified since these lines have a very low probability of being chosen as the basis; and
eliminating them at this stage leads to power savings in the encoder. Regi or Regj stores the self-
switching count. The number of these registers is equal to bus width. Hardware components of this
section are shared among all other elements of the matrix. Positive value of αi, j can be ensured if
switching count of b j is greater than half of the switching count of basis [see (3) and (4)]. Eliminator block
takes this scenario into account, using the comparator, to eliminate inessential computation. The enable line of comparator
output enables the computation of αi, j , as shown in Fig. 7. The diagram demonstrates one possible way to implement
computation of bitwise switching savings for a particular observation window. One of the inputs of the final subtracter is the
switching count of the basis for that window; 4-bit register stores the joint transitional count of basis bi and line b j . Hardware
components (adder and register) and stored value in the register are shared with mirror element of (bi , b j ). This reduces
encoder dynamic power by minimizing internal node switching. The overall savings computation unit is an adder stage which
takes input from the bitwise savings computation blocks and computes the total switching savings when a particular line is
chosen as the basis. The basis selection unit takes the output of all the overall savings computation units and finally selects
one of the bit lines as basis [see (6)]. For the logical interpretation of the basis selection unit, consider that tn i represents the
nth bit from the left in the N-bit binary representation of the number of switching savings obtained if Fig. 8. Decoder architecture
for proposed scheme. i th bit line is chosen as basis. Pn i stores the status of the i th line when comparison has been done up to the
nth bit from the left, , then i th line is still in consideration for being the basis after the most significant n bits have been
compared. The equation for the basis selection and cluster information unit, for the MSB of the input to the basis selection
unit, is where the summation represents logical OR. This has to be repeated for all N bits of the input. The gate level
implementation of the Boolean expression in (14) is given in the Appendix. The decoder architecture is presented in Fig. 8
which retrieves the data back to its original form. The cluster information is sent as control signal at the beginning of encoded
data for every observation window. Before decoding the data, the decoder extracts this cluster information by observing the
transition between control signal and encoded data of previous observation window. This information is kept in the register
for each bit line until the end of decoding for current window.
Software details:P
6.1 Why (V) HDL?
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is Verilog:
Verilog, standardized as IEEE 1364, is a hardware description language (HDL)
used to model electronic systems. It is most commonly used in the design and verification
of digital circuits at the register-transfer level of abstraction. It is also used in the
verification of analog circuits and mixed-signal circuits.
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,
undefined") and signal strengths (strong, weak, etc.). This system allows abstract
modeling of shared signal lines, where multiple sources drive a common net. When a
wire has multiple drivers, the wire's (readable) value is resolved by a function of the
source drivers and their strengths.Asubset of statements in the Verilog language
aresynthesizable. Verilog modules that conform to a synthesizable coding style, known as
RTL (register-transfer level), can be physically realized by synthesis software. Synthesis
software algorithmically transforms the (abstract) Verilog source into a netlist, a logically
equivalent description consisting only of elementary logic primitives (AND, OR, NOT,
flip-flops, etc.) that are available in a specific FPGA or VLSI technology. Further
manipulations to the netlist ultimately lead to a circuit fabrication blueprint (such as a
photo mask set for an ASIC or a bitstream file for an FPGA).
Example:
modulemain;
initial
begin
$display("Hello world!");
$finish;
end
endmodule
The Verilog Procedural Interface (VPI), originally known as PLI 2.0, is an
interface primarily intended for the Cprogramming language. It allows behavioral
Verilog code to invoke C functions, and C functions to invoke standard Verilog system
tasks. The Verilog Procedural Interface is part of the IEEE 1364 Programming Language
Interface standard; the most recent edition of the standard is from 2005. VPI is sometimes
also referred to as PLI 2, since it replaces the deprecatedProgram Language Interface
(PLI).
While PLI 1 was depreciated in favor of VPI (aka. PLI 2), PLI 1 is still commonly used
over VPI due to its much more widely documented tf_put, tf_get function interface that is
described in many verilog reference books.
moduletoplevel(clock,reset);
inputclock;
inputreset;
regflop1;
regflop2;
always@(posedgeresetorposedgeclock)
if(reset)
begin
flop1<=0;
flop2<=1;
end
else
begin
flop1<=flop2;
flop2<=flop1;
end
endmodule
The definition of constants in Verilog supports the addition of a width parameter. The
basic syntax is:
Examples:
SOFTWARE INFORMATION:
Create a New Project Create a new ISE project which will target the FPGA device on the
Spartan-3 Startup Kit demo board. To create a new project: 1. Select File > New Project...
The New Project Wizard appears. 2. Type tutorial in the Project Name field. 3. Enter or
browse to a location (directory path) for the new project. A tutorial subdirectory is
created automatically. 4. Verify that HDL is selected from the Top-Level Source Type
list. 5. Click Next to move to the device properties page. 6. Fill in the properties in the
table as shown below:
♦ Product Category: All
♦ Family: Spartan3
♦ Device: XC3S200
♦ Package: FT256
♦ Speed Grade: -4
♦ Top-Level Source Type: HDL
♦ Synthesis Tool: XST (VHDL/Verilog)
♦ Simulator: ISE Simulator (VHDL/Verilog)
♦ Preferred Language: Verilog (or VHDL)
♦ Verify that Enable Enhanced Design Summary is selected.
Leave the default values in the remaining fields. When the table is complete, your project
properties will look like the following:
The next step in creating the new source is to add the behavioral description for
counter. Use a simple counter code example from the ISE Language Templates and
customize it for the counter design. 1. Place the cursor on the line below the output [3:0]
COUNT_OUT; statement. 2. Open the Language Templates by selecting Edit →
Language Templates… Note: You can tile the Language Templates and the counter file
by selecting Window → Tile Vertically to make them both visible. 3. Using the “+”
symbol, browse to the following code example: Verilog → Synthesis Constructs →
Coding Examples → Counters → Binary → Up/Down Counters → Simple Counte.
4. With Simple Counter selected, select Edit → Use in File, or select the Use
Template in File toolbar button. This step copies the template into the counter source file.
5. Close the Language Templates
Design Simulation :
♦ The counter must operate correctly with an input clock frequency = 25 MHz.
♦ The DIRECTION input will be valid 10 ns before the rising edge of CLOCK
♦ Global Signals: GSR (FPGA) Note: When GSR(FPGA) is enabled, 100 ns. is
added to the Offset value automatically.
8. Click Finish to complete the timing initialization. 9. The blue shaded areas that precede
the rising edge of the CLOCK correspond to the Input Setup Time in the Initialize
Timing dialog box. Toggle the DIRECTION port to define the input stimulus for the
counter design as follows:
♦ Click on the blue cell at approximately the 300 ns to assert DIRECTION high so that
the counter will count up.
♦ Click on the blue cell at approximately the 900 ns to assert DIRECTION low so that
the counter will count down.
Simulation Results: