You are on page 1of 69

CHAPTER 1: INTRODUCTION

The microprocessor plays a very significant role in the everyday functioning of


industrialized societies. There is practically no new machine, instrument, control
equipment or information system that does not have a microprocessor in it. It is an
indispensable component in every digital system.
In other words, it is a silicon chip that contains a CPU. In the world of personal
computers, the terms microprocessor and CPU are used interchangeably. At the heart of
all personal computers and most workstations sits a microprocessor. Microprocessors
also control the logic of almost all digital devices, from clock radios to fuel-injection
systems for automobiles.
Typical microprocessors incorporate arithmetic and logic functional units as well
as the associated control logic, instruction processing circuitry, and a portion of the
memory hierarchy. Microprocessors also play supporting roles within larger computers as
smart controllers for graphics displays, storage devices, and high-speed printers.
However, the vast majority of microprocessors are used to control everything from
consumer appliances to smart weapons. The microprocessor has made possible the
inexpensive hand-held electronic calculator, the digital wristwatch, and the electronic
game. Microprocessors are used to control consumer electronic devices, such as the
programmable microwave oven and videocassette recorder; to regulate gasoline
consumption and antilock brakes in automobiles; to monitor alarm systems; and to
operate automatic tracking and targeting systems in aircraft, tanks, and missiles and to
control radar arrays that track and identify aircraft, among other defense applications.

1.1 What is a microprocessor?


A microprocessor is a multipurpose, programmable, clock-driven, register based
electronic device that reads binary instructions from a storage device called memory,
accepts binary data as input ,and processes data according to those instructions, and
provides results as output. A microprocessor incorporates most or all of the functions of a
computer's central processing unit (CPU) on a single integrated circuit (IC).

1
MEMORY

MICROPROCESSOR INPUT

OUTPUT

Fig 1.1 programmable machine

The first microprocessors emerged in the early 1970s and were used for electronic
calculators, using binary-coded decimal (BCD) arithmetic on 4-bit words. Other
embedded uses of 4-bit and 8-bit microprocessors, such as terminals, printers, various
kinds of automation etc., followed soon after. Affordable 8-bit microprocessors with 16-
bit addressing also led to the first general-purpose microcomputers from the mid-1970s
on.
During the 1960s, computer processors were often constructed out of small and
medium-scale ICs containing from tens to a few hundred transistors. The integration of a
whole CPU onto a single chip greatly reduced the cost of processing power. From these
humble beginnings, continued increases in microprocessor capacity have rendered other
forms of computers almost completely obsolete (see history of computing hardware),
with one or more microprocessors used in everything from the smallest embedded
systems and handheld devices to the largest mainframes and supercomputers.
Since the early 1970s, the increase in capacity of microprocessors has followed
Moore's law, which suggests that the number of transistors that can be fitted onto a chip
doubles every two years. Although originally calculated as a doubling every year, Moore
later refined the period to two years. It is often incorrectly quoted as a doubling of
transistors every 18 months.

Three basic characteristics differentiate microprocessors:


i. Instruction set: The set of instructions that the microprocessor can execute.
ii. Bandwidth : The number of bits processed in a single instruction.
iii. Clock speed : Given in megahertz (MHz), the clock speed determines how
many instructions per second the processor can execute.

2
In both cases, the higher the value, the more powerful the CPU. For example, a
32-bit microprocessor that runs at 50MHz is more powerful than a 16-bit microprocessor
that runs at 25MHz. In addition to bandwidth and clock speed, microprocessors are
classified as being either RISC (reduced instruction set computer) or CISC (complex
instruction set computer).
The microprocessors used in systems are mainly of 2 types-:
1) Microcontrollers, that includes all components shown in fig above, and (2) general
purpose microprocessor, with discrete components shown in fig above.

1.2 Harvard Architecture Vs Von-Neumann architecture


The name Harvard Architecture comes from the Harvard Mark I relay-based
computer. The most obvious characteristic of the Harvard Architecture is that it has
physically separate signals and storage for code and data memory. It is possible to access
program memory and data memory simultaneously. Typically, code (or program)
memory is read-only and data memory is read-write. Therefore, it is impossible for
program contents to be modified by the program itself. You can increase the throughput
because while you are executing 1 instruction you can be fetching the next instruction.
The Von-Neumann Architecture is named after the mathematician and early
computer scientist John von Neumann. Von Neumann machines have shared signals and
memory for code and data. Thus, the program can be easily modified by itself since it is
stored in read-write memory. In the von Neumann architecture, program and data are
stored in the same memory and managed by the same information-handling subsystem. In
the Harvard architecture, program and data are stored and handled by different
subsystems. The main disadvantage of this is that it has low operating bandwidth. This is
the essential difference between the two architectures.
However, in some niches, particularly certain embedded applications where the
program is more-or-less hard wired, task requirements are such that the Harvard
architecture can provide distinct operational advantages. Under certain conditions, a
Harvard computer can be much faster than a von Neumann computer because data and
program do not contend for the same information pathway, and storing the program in an
immutable read-only memory can result in vast reliability improvements.

3
Fig 1.2 Harvard and Von-Neumann architecture

1.3 Processor development using FPGA


A Field-programmable Gate Array (FPGA) is an integrated circuit designed to be
configured by the customer or designer after manufacturing—hence "field-
programmable". The FPGA configuration is generally specified using a hardware
description language (HDL), similar to that used for an application-specific integrated
circuit (ASIC) .FPGAs can be used to implement any logical function that an ASIC could
perform. The ability to update the functionality after shipping, partial re-configuration of
the portion of the design and the low non-recurring engineering costs relative to an ASIC
design offer advantages for many applications. FPGAs are increasingly used in
conventional high performance computing applications where computational kernels such
as FFT or Convolution are performed on the FPGA instead of a microprocessor.

4
A recent trend has been to take the coarse-grained architectural approach a step
further by combining the logic blocks and interconnects of traditional FPGAs with
embedded microprocessors and related peripherals to form a complete "system on a
programmable chip". An alternate approach to using hard-macro processors is to make
use of soft processor cores that are implemented within the FPGA logic.
FPGAs are beneficial in industrial designs:
i. Design integration with user‘s choice of intellectual property (IP) and software
stacks.
ii. Flexibility to change design to keep pace with evolving protocols and new feature
requirements.
iii. Performance scaling with embedded processors and IP blocks within the FPGA.

5
CHAPTER 2: VHDL BASICS

2.1 Introduction
VHDL stands for VHSIC (Very High Speed Integrated Circuits) Hardware
Description Language. In the mid-1980‘s the U.S. Department of Defense and the IEEE
sponsored the development of this hardware description language with the goal to
develop very high-speed integrated circuit. It has become now one of industry‘s standard
languages used to describe digital systems.
Although these languages look similar as conventional programming
languages, there are some important differences. A hardware description language is
inherently parallel, i.e. commands, which correspond to logic gates, are executed
(computed) in parallel, as soon as a new input arrives. A HDL program mimics the
behavior of a physical, usually digital, system. It also allows incorporation of timing
specifications (gate delays) as well as to describe a system as an interconnection of
different components.
VHDL allows one to describe a digital system at the structural or the
behavioral level. The behavioral level can be further divided into two kinds of styles:
Data flow and Algorithmic. The dataflow representation describes how data moves
through the system. This is typically done in terms of data flow between registers. The
data flow model makes use of concurrent statements that are executed in parallel as soon
as data arrives at the input. On the other hand, sequential statements are executed in the
sequence that they are specified. VHDL allows both concurrent and sequential signal
assignments that will determine the manner in which they are executed.

2.2 Why use VHDL for coding?


Higher density programmable logic devices, including complex PLDs
(CPLD) and field programmable gate arrays (FPGA), can be used to integrate large
amounts of logic in a single IC. Semi custom and full custom application specific
integrated circuits (ASIC) devices are also used for integrating large amounts of digital
logic, but CPLDs and FPGAs provide additional flexibility: they can be used with tighter

6
schedules, for low volume products, and for first production runs even with high volume
products.
Designing with larger capacity CPLDs and FPGAs of 500 to more than
100,000 gates, Boolean equations or gate level descriptions can no longer be used to
quickly and efficiently complete a design. VHDL provides high level language constructs
that enable designers to describe large circuits and bring products to market rapidly. It
supports the creation of design libraries in which to store components for reuse in
subsequent designs .Because it is a standard language (IEEE standard 1076), VHDL
provides portability of code between synthesis and simulation tools, as well as device
independent design.
An appropriate design methodology is the one that increases the efficiency
of designers. At a slightly more detailed level, it facilitates capturing, understanding and
maintaining a design; it is not open to interpretation ,but is well defined; it is an open
standard accepted by industry; it allows designs to be ported from one EDA environment
to another , so that modules can be packaged and reused ; it supports complex designs
with hierarchy and gate level to system level design ; it may be used for the description ,
simulation and synthesis of the logic circuits and it supports multiple levels of design
description. VHDL satisfies all the mentioned requirements for digital logic design. For
the combined purpose of documentation, synthesis and simulation for both devices and
systems, VHDL is the only excellent choice.
VHDL is a product of the VHSIC program funded by the Department of
Defense in the 1970s and 1980s. VHDL was established the IEEE 1076 standard in 1987.
In 1993, the IEEE standard was updated and an additional VHDL standard, IEEE 1164,
was adopted. In 1996, IEEE 1076.3 became a VHDL synthesis standard.

2.3 Capabilities of VHDL


1) Power and Flexibility:
VHDL has powerful language constructs with which to write succinct
code descriptions complex control logic. It also has multiple levels of design
description for controlling design implementation. It supports design libraries and the

7
creation of reusable components. It provides the design hierarchies to create modular
designs.
2) Device –Independent Design:
VHDL permits to create a design without having to first choose a device
for implementation. With one design description, many device architectures can
be targeted. VHDL also permits multiple styles of design description.
Eg:
Net lists:
U1: xor2 port map (a (0), b (0), x (0));
U2: xor2 port map (a (1), b (1), x (1));
U3: nor2 port map(x(0),x(1),aeqb);
Boolean Equations:
aeqb <= (a(0) XOR b(0)) NOR (a(1) XOR b(1));
Concurrent Statements:
aeqb <= ‗‘when a = b else ‗‘;
Sequential Statements:
if a = b then aeqb <= ‗‘;
else aeqb <= ‗‘;
end if;

3) Portability:
VHDL‘s portability permits to simulate the same design description that is
synthesized. Simulating a several- thousand- gate design description before
synthesizing it can save considerable time: a design flaw discovered at this stage
can be corrected before the design implementation stage. Because VHDL is a
standard, one design description can be taken from one simulator to another, one
synthesis tool to another and one platform to another.
4) Benchmarking Capabilities:
Device – independent design and portability allows benchmarking a
design using different design device architectures and different synthesis tools. A
completed design description can be taken and it can be synthesized, creating

8
logic for architecture of the required choice. The results can be evaluated and
the device that best fits the design requirement can be chosen. The same can be
done with synthesis tools to measure the quality of the synthesis.
5) ASIC Migration:
The efficiency that VHDL generates allows the product to hit the market
quickly if the design is synthesized to a CPLD or FPGA. When production
volumes reach appropriate levels, VHDL facilitates the development of an ASIC.
Sometimes, the exact code used with the PLD can be used with an ASIC.
6) Quick Time –to –Market and Low Cost:
VHDL and programmable logic pair well together to facilitate a speedy
design process. VHDL permits designs to be described quickly.
Programmable logic eliminates NRE expenses and facilitates quick design
iterations. Synthesis makes it all possible. VHDL and programmable logic
combine as a powerful vehicle to bring products to markets in a very short time.

2.4 Basic structure of a VHDL file


A digital system in VHDL consists of a design entity that can contain
other entities that are then considered components of the top-level entity. Each entity is
modeled by an entity declaration and an architecture body. One can consider the entity
declaration as the interface to the outside world that defines the input and output signals,
while the architecture body contains the description of the entity and is composed of
interconnected entities, processes and components, all operating concurrently. In a typical
design there will be many such entities connected together to perform the desired
function.
i. Entity declaration
The entity declaration defines the NAME of the entity and lists the input
and output ports. The general form is as follows,
entity NAME_OF_ENTITY is [ generic generic_declarations);]
port (signal_names: mode type;
signal_names: mode type);
end [NAME_OF_ENTITY] ;

9
An entity always starts with the keyword entity, followed by its name and
the keyword is. Next are the port declarations using the keyword port. An entity
declaration always ends with the keyword end, optionally followed by the name
of the entity.
ii. Architecture body
The architecture body specifies how the circuit operates and how it is
implemented. As discussed earlier, an entity or circuit can be specified in a
variety of ways, such as behavioral, structural (interconnected components), or a
combination of the above.
The architecture body looks as follows,
architecture architecture_name of NAME_OF_ENTITY is
-- Declarations
-- components declarations
-- signal declarations
-- constant declarations
-- function declarations
-- procedure declarations
-- type declarations
begin
-- Statements
:
end architecture_name;
iii. Library and Packages: library and use keywords
A library can be considered as a place where the compiler stores
information about a design
project. A VHDL package is a file or module that contains declarations of
commonly used objects, data type, component declarations, signal,
procedures and functions that can be shared among different VHDL models.
For example std_logic is defined in the package ieee.std_logic_1164 in the
ieee library. In order to use the std_logic one needs to specify the library and

10
package. This is done at the beginning of the VHDL file using the library and the
use keywords as follows:
library ieee;
use ieee.std_logic_1164.all;
The .all extension indicates to use all of the ieee.std_logic_1164 package.
One can add other libraries and packages. The syntax to declare a package is as
follows:
-- Package declaration
package name_of_package is
package declarations
end package name_of_package;
-- Package body declarations
package body name_of_package is
package body declarations
end package body name_of_package;

2.6 Behavioral modeling: Sequential statements


VHDL provides means to represent digital circuits at different levels of
representation of abstraction, such as the behavioral and structural modeling. The basis
for sequential modeling is the process construct. The process construct allows us to
model complex digital systems, in particular sequential circuits.
i. Process
A process statement is the main construct in behavioral modeling that
allows you to use sequential statements to describe the behavior of a system
over time. The syntax for a process statement is
[process_label:] process [ (sensitivity_list) ] [is]
[ process_declarations]
begin
list of sequential statements such as:
signal assignments
variable assignments

11
case statement
exit statement
if statement
loop statement
next statement
null statement
procedure call
wait statement
end process [process_label];
An example of a positive edge-triggered D flip-flop is as follows.
library ieee;
use ieee.std_logic_1164.all;
entity DFF_CLEAR is
port (CLK, CLEAR, D : in std_logic;
Q : out std_logic);
end DFF_CLEAR;
architecture BEHAV_DFF of DFF_CLEAR is
begin
DFF_PROCESS: process (CLK, CLEAR)
Begin
if (CLEAR = ‗‘) then
Q <= ‗‘;
elsif (CLK‘event and CLK = ‗‘) then
Q <= D;
end if;
end process;
end BEHAV_DFF;
A process is declared within architecture and is a concurrent statement.
However, the statements inside a process are executed sequentially. Like other
concurrent statements, a process reads and writes signals and values of the
interface (input and output) ports to communicate with the rest of the architecture.

12
One can thus make assignments to signals that are defined externally to the
process, such as the Q output of the flip-flop in the above example. The
expression CLK‘event and CLK = ‗‘ checks for a positive clock edge.
The sensitivity list is a set of signals to which the process is sensitive. Any
change in the value of the signals in the sensitivity list will cause immediate
execution of the process. If the sensitivity list is not specified, one has to include a
wait statement to make sure that the process will halt.
ii. If Statements
The if statement executes a sequence of statements whose sequence
depends on one or more conditions. The syntax is as follows:
if condition then
sequential statements
[elsif condition then
sequential statements ]
[else
sequential statements ]
end if;
Each condition is a Boolean expression. The if statement is performed by
checking each condition in the order they are presented until a ―true‖ is found.
Nesting of if statements is allowed.
iii. Case statements
The case statement executes one of several sequences of statements,
based on the value of a single expression. The syntax is as follows,
case expression is
when choices =>
sequential statements
when choices =>
sequential statements
-- branches are allowed
[ when others => sequential statements ]
end case;

13
The expression must evaluate to an integer, an enumerated type of a one-
dimensional array, such as a bit_vector. The case statement evaluates the
expression and compares the value to each of the choices. The when clause
corresponding to the matching choice will have its statements executed.
iv. Loop statements
A loop statement is used to repeatedly execute a sequence of sequential
statements. The syntax for a loop is as follows:
[ loop_label :]iteration_scheme loop
sequential statements
[next [label] [when condition];
[exit [label] [when condition];
end loop [loop_label];
Labels are optional but are useful when writing nested loops. The next and
exit statement are sequential statements that can only be used inside a loop.
The next statement terminates the rest of the current loop iteration and execution
will proceed to the next loop iteration. The exit statement skips the rest of the
statements, terminating the loop entirely, and continues with the next statement
after the exited loop.
There are three types of iteration schemes:
a. Basic Loop statement
This loop has no iteration scheme. It will be executed continuously
until it encounters an exit or next statement.
[ loop_label :] loop
sequential statements
[next [label] [when condition];
[exit [label] [when condition];
end loop [ loop_label];
The basic loop (as well as the while-loop) must have at least one wait
statement

14
b. While-Loop statement
The while … loop evaluates a Boolean iteration condition. When
the condition is TRUE, the loop repeats, otherwise the loop is skipped and
the execution will halt. The syntax for the while…loop is as follows,
[ loop_label :] while condition loop
sequential statements
[next [label] [when condition];
[exit [label] [when condition];
end loop[ loop_label ];
The condition of the loop is tested before each iteration, including
the first iteration. If it is false, the loop is terminated.
c. For-Loop statement
The for-loop uses an integer iteration scheme that determines the number
of iterations. The syntax is as follows,
[ loop_label :] for identifier in range loop
sequential statements
[next [label] [when condition];
[exit [label] [when condition];
end loop[ loop_label ];
i. The identifier (index) is automatically declared by the loop itself, so
one does not need to declare it separately. The value of the identifier
can only be read inside the loop and is not available outside its loop.
One cannot assign or change the value of the index.
ii. The range must be a computable integer range in one of the following
forms, in which integer_expression must evaluate to an integer:
iii. integer_expression to integer_expression
iv. integer_expression downto integer_expression
v. Wait statement
The wait statement will halt a process until an event occurs. There are
several forms of the wait statement,
wait until condition;

15
wait for time expression;
wait on signal;
wait;
The condition in the ―wait until‖ statement must be TRUE for the process to
resume.
A few examples follow.
wait until CLK=‘‘;
wait until CLK=‘‘;
For the first example the process will wait until a positive-going clock edge
occurs, while for the second example, the process will wait until a negative-going clock
edge arrives.

2.7 Dataflow modeling

Behavioral modeling can be done with sequential statements using the process
construct or with concurrent statements. This method is usually called dataflow
modeling. The dataflow modeling describes a circuit in terms of its function and the flow
of data through the circuit. Concurrent signal assignments are event triggered and
executed as soon as an event on one of the signals occurs.

i. Simple Concurrent signal assignments


A simple concurrent signal assignment is given in the following
examples,
Sum <= (A xor B) xor Cin;
Carry <= (A and B);
Z <= (not X) or Y after 2 ns;
The syntax is as follows:
Target_signal <= expression;
in which the value of the expression transferred to the target_signal. As soon as an event
occurs on one of the signals, the expression will be evaluated. The type of the
target_signal has to be the same as the type of the value of the expression.
ii. Conditional Signal assignments

16
The syntax for the conditional signal assignment is as follows:
Target_signal <= expression when Boolean_condition else
expression when Boolean_condition else
expression;
The target signal will receive the value of the first expression whose
Boolean condition is TRUE. If no condition is found to be TRUE, the target signal will
receive the value of the final expression. If more than one condition is true, the value of
the first condition that is TRUE will be assigned.
iii. Selected Signal assignments
The selected signal assignment is similar to the conditional one described
above. The syntax is as follows,
with choice_expression select
target_name <= expression when choices,
target_name <= expression when choices,
:
target_name <= expression when choices;
The target is a signal that will receive the value of an expression whose choice
includes the value of the choice_expression.
iv. Structural Modeling
A structural way of modeling describes a circuit in terms of components
and its interconnection. Each component is supposed to be defined earlier (e.g. in
package) and can be described as structural, a behavioral or dataflow model. At the
lowest hierarchy each component is described as a behavioral model, using the basic
logic operators defined in VHDL. In general structural modeling is very good to
describe complex digital systems, though a set of components in a hierarchical fashion.
A structural description can best be compared to a schematic block diagram that can be
described by the components and the interconnections. VHDL provides a formal way to
do this by
a. Declare a list of components being used
b. Declare signals which define the nets that interconnect components

17
c. Label multiple instances of the same component so that each instance is
uniquely defined.
The components and signals are declared within the architecture body,
architecture architecture_name of NAME_OF_ENTITY is
-- Declarations
component declarations
signal declarations
begin
-- Statements
component instantiation and connections
:
end architecture_name;
v. Component declaration
Before components can be instantiated they need to be declared in the
architecture declaration section or in the package declaration. The component
declaration consists of the component name and the interface (ports). The syntax is as
follows:
component component_name [is]
[port (port_signal_names: mode type;
port_signal_names: mode type;
port_signal_names: mode type);]
end component [component_name];
The list of interface ports gives the name, mode and type of each port, similarly
as is done in the entity declaration.
vi. Component Instantiation and interconnections
The component instantiation statement references a component that can be
i. Previously defined at the current level of the hierarchy or
ii. Defined in a technology library (vendor‘s library).
The syntax for the components instantiation is as follows,
instance_name : component name
port map (port1=>signal1, port2=> signal2,… port3=>signaln);

18
The instance name or label can be any legal identifier and is the name of
this particular instance. The component name is the name of the component declared
earlier using the component declaration statement. The port name is the name of the
port and signal is the name of the signal to which the pecific port is connected. The
above port map associates the ports to the signals through named association.

19
CHAPTER 3: STEPS IN PROCESSOR DEVELOPMENT

VHDL SYNTHESIS

RTL SIMULATION

VHDL DESIGN

POST SYNTHESIS SIMULATION

PLACE AND ROUTE

PLACE AND ROUTE SIMULATION

BIT FILE GENERATION

Fig 3.1: Flow diagram showing development steps of the processor

20
 The design starts with VHDL specification, which specifies the behavior expected in
the final design.
 In the next step, an RTL(Register Transfer Level) is created in which clock-by-clock
behavior of the design is described.
 The correctness of VHDL is verified by RTL Simulation using test vectors. The
output of this stage is waveforms display

3.1 Process of building real hardware


 VHDL synthesis
 The goal of VHDL synthesis is to create a design that implements the
required functionality and matches the constraints in speed, area or
power.
 The VHDL synthesis tools convert the VHDL description into a net list
for the target FPGA.
 FPGA contains basic cell in turn contains gate measurements.
 The output of this stage is gate or macro-level output format
compactable to place and route tools.
 Post synthesis simulation
 It is a functional gate level verification which is done by using VHDL
net list from synthesis tools plus a library of the synthesis primitives
into the VHDL simulator and runs simulation using the RTL
verification vectors.
 Place and route tools
 Used to take the design net list and implement the design in the target
technology device.
 FPGA vendors provide these tools for placing the input to place and
root tools includes :
1. Net list
2. Timing constraints
3. Placement constraints
4. Device information

21
 After the cells are placed, router makes the appropriate connections.
 The output is data files used to implement the chip which describes
connections required to make the FPGA macrocells implement the
functionality required and timing file which describes the timing of
programmable FPGA.
 Place and route simulation verify the results of the above process
 After completing hardware platform design entry, generate the
bitstream(BIT) file that represents the completed hardware platform.
 To create BIT file for the implemented design, must first set User
Constraints File(UCF).
 The UCF specifies pinouts and timing constraints. It can also control a
variety of other hardware implementation features, such as the
configurable electrical characteristics of your FPGA I/O signals.
 To make the device permanent, load the above steps to an ASIC device.
The place and route tools for ASIC device can be obtained from the
corresponding ASIC vendor or EDA(Electronic Design Automation)
vendor.

22
CHAPTER 4: DESIGN OF 8 BIT PROCESSOR

The processor that we embarked on developing was an 8 bit processor, which


could do most of the instructions of the Intel 8085. The chief difference between our
processor and the 8085 is that, ours is based on the Harvard Architecture. This
architecture enables the execution of instructions in a single cycle. The division of the
program and data memories requires the use of separate addressing, but the advantage is
that they can be accessed at the same time, which can be used in pipelining. Our
processor does not implement pipelining, but with a few improvements it can easily be
developed to support it. Unlike other processors commonly developed using HDL tools
today, our processor has an inbuilt programmed ROM. Thus any program may be fed into
the processor, and it can be used in the field or industry to perform the required control
tasks accordingly, once testing and timing are checked. This chapter outlines the main
architectural features of our processor, and gives a functional description of all the
components.

4.1 Architecture of CPU


The architecture of our processor is relatively simple. The idea was to get a strong
understanding of the working intricacies of a real time processor. The main architectural
features of our processor are:
 8 bit data handling capability (8 bit data bus)
 256 byte ROM or program memory
 128 byte RAM or data memory
 Separate addressing of program memory and data memory
 16 bit instruction set
 8 bit program counter
 8 bit ALU
 8 bit accumulator and 3 general purpose registers
 One output port
 Master Reset and clock inputs

23
Clock
Processor
Reset Output
8

Fig 4.1 processor schematic

Fig 4.2 Architecture of the processor

24
4.2 Functional description of modules
The various components or modules of our processor were designed, tested and
simulated separately for operational correctness. This section briefly describes each
module, its functions, features and design characteristics.

4.2.1 ALU
The Arithmetic and Logic unit of our processor was the first module to be
developed. In computing, the arithmetic logic unit (ALU) is a digital circuit that performs
arithmetic and logical operations. The ALU is a fundamental building block of the CPU
of a computer and even the simplest microprocessors contain one for purposes such as
maintaining timers. The processors found inside modern CPUs and GPUs accommodate
very powerful and very complex ALUs; a single component may contain a number of
ALUs.
An ALU must process numbers using the same format as the rest of the digital
circuit. The format of modern processors is almost always the two‘s complement binary
number representation. Early computers used a variety of number systems, including
one‘s complement, sign-magnitude format and even true decimal systems with ten tubes
per digit.
ALUs for each one of these numeric systems had different designs and that influenced the
current preference for two‘s complement as this is the representation that makes it easier
for the ALUs to calculate additions and subtractions.
Most ALUs can perform the following operations:
 Integer arithmetic operations(addition, subtraction and sometimes multiplication
and division, though this is more expensive)
 Bitwise logic operations(AND,NOT,OR,XOR)
 Bit-shifting operations (shifting or rotating a word by a specified number of bits
to the left or right, with or without sign extension). Shifts can be interpreted as
multiplications by 2 and divisions by 2.
An engineer can design ALU to calculate any operation, however complicated it is; The
problem is that the more complex the operation, the more expensive the ALU is, the more
space it uses in the processor and the more power it dissipates, etc. Therefore, engineers

25
always calculate a compromise to provide for the processor (or other circuits) an ALU
powerful enough to make the processor fast, but yet not so complex as to become
prohibitive. Imagine that you need to calculate the square root of a number, the digital
engineer will examine the following options to implement this operation:
1. Design an extraordinary complex ALU that calculates the square root of any
number in a single step. This is called calculation in a single clock.
2. Design a very complex ALU that calculates the square root of any number in
several steps. But the intermediate results go through a series of circuits that are
arranged in a line, like factory production line. That makes the ALU capable of
accepting new numbers to calculate even before finished calculating the previous
ones. That makes the ALU able to produce numbers as fast as a single clock
ALU, although the results start to flow out of the ALU only after an initial delay.
This is called calculation pipeline.
3. Design a complex ALU that calculates the square root through several steps. This
is called interactive calculation and usually relies on control from a complex
control unit with built-in-microcode.
4. Design a simple ALU in the processor and sell a separate specialized and costly
processor that the customer can install just beside this one and implements one of
the options above. This is called the co-processor.
5. Tell the programmers that there is no co-processor and there is no emulation, so
they will have to write their own algorithms to calculate square roots by software.
This is performed by software libraries.
6. Emulate the existence of the co-processor, that is, whenever a program attempts to
perform the square root calculation, make the processor check if there is a co-
processor present and use it if there is one; if there isn‘t one, interrupt the
processing of the program and invoke the operating system to perform the square
root calculation through some software algorithm. This is called software
emulation.
The options above go from the fastest and most expensive one to the slowest and least
expensive one. Therefore, while even the simplest computer can calculate the most
complicated formula, the simplest computers will usually take a long time doing that

26
because of the several steps for calculating the formula. The inputs to the ALU are the
data to be operated on (called operands) and a code from the control unit indicating
which operation to perform. Its output is the result of the computation. In many designs
the ALU also takes or generates as inputs or outputs a set of condition codes from or to a
status register. These codes are used to indicate cases such as carry-in or carry-out,
overflow, divide-by-zero, etc.
The ALU of our processor has two 8 bit data inputs, ‗a‘ and ‗b‘. The ‗a‘ input is
taken directly from the accumulator, and the ‗b‘ input is attached to the data bus. Besides
these inputs, we have the following input signals into the ALU:
 clk:in std_logic; (main clock signal)
 r:in std_logic; (reset input)
 and_enable:in std_logic;
 or_enable:in std_logic;
 not_enable:in std_logic;
 xor_enable: in std_logic;
 shiftleft_enable:in std_logic;
 shiftright_enable:in std_logic;
 add_enable:in std_logic;
 subtract_enable:in std_logic;
 multiply_enable:in std_logic;
 compare_enable:in std_logic;
The ouput of the ALU is directly connected to the accumulator. Thus, any
processed data is stored in the accumulator after instruction execution. In addition, the
ALU has two output flag signals. The zset (zero) and the cset (carry) flag signals. They
are connected to the control unit for detection of conditions during certain instruction
decoding operations.

4.2.2 Accumulator
The processor has an 8 bit accumulator. The accumulator has the following input
signals:
 acc_rd_alu : in STD_LOGIC;

27
 acc_wr_alu : in STD_LOGIC;
 eni : in STD_LOGIC;
 eno : in STD_LOGIC;
 clk : in STD_LOGIC;
 r : in STD_LOGIC;
 d : in STD_LOGIC_VECTOR (7 downto 0);
 alu_in : in STD_LOGIC_VECTOR (7 downto 0);
The clk and r inputs are the clock and reset respectively. Besides these, there are two 8 bit
inputs to the accumulator. The ‗d‘ input and the ‗alu_in‘ input. The ‗d‘ input is connected
directly to the databus of the processor. The ‗alu_in‘ input is connected to the ALU, and
is used to store the results of arithmetic or logical operations after execution. The various
control signal inputs are also shown. acc_rd_alu and acc_wr_alu are concerned with the
reading in and writing into of data from and to the ALU. The eni and eno signals control
the writing and reading of data from the databus. The ouputs are ‗q‘ which is connected
to the databus, and ‗alu_out‘ which is connected to the ALU directly into its ‗a‘ input.
The accumulator, like all registers and RAM of the processor, is operated with clock and
reset sensitivity, but is not negative edge triggered. This provision is provided to enable
the reading and writing of data into the accumulator in the same clock cycle. The
execution of instructions in a single cycle of clock depends greatly on this provision.

4.2.3 Register Bank


The processor has, besides the accumulator, three general purpose registers. They
are named as B, C and D registers. The number of general purpose registers can be
conveniently increased to accommodate more, but we have reduced the number to just
four including the accumulator to reduce the number of control signals to be generated.
Also, the programs that we intend to run using our processor are very simple ones for
demonstration that do not require many number of registers. The registers are all 8 bit,
and are sensitive to clock and reset signals. The three registers were first described as an
entity ‗latchreg‘ and tested. Then, they were grouped together to form the entity
‗reg_bank‘ using structural description. Each register in the reg_bank has a latched output

28
and also a high impedance buffer. The registers are addressed for input or ouput of data
directly from the control unit. The various input signals to the reg_bank are:
 B_rd : in STD_LOGIC;
 B_wr : in STD_LOGIC;
 C_rd : in STD_LOGIC;
 C_wr : in STD_LOGIC;
 D_rd : in STD_LOGIC;
 D_wr : in STD_LOGIC;
 clk : in STD_LOGIC;
 r : in STD_LOGIC;
 d : in STD_LOGIC_VECTOR (7 downto 0);

The 8 bit data input ‗d‘ is common to all registers, and is connected to the databus. The
input control signals, besides the clk(clock) and r(reset) are the read and write enable
signals for the individual registers. They are identified using the names itself easily. The
output signals are ‗qb‘, ‗qc‘ and ‗qd‘. They are 8 bits each, and are also latched and
connected to the databus.

4.2.4 Data Memory (RAM)


In VHDL, a memory is basically defined as an array of an array of a data type.
The processor has a 128 byte data memory. The 128 byte memory is made up of 2 8
registers each of one byte. They are addressed using an 8 bit address. The data memory is
synchronised to the clock, but like the register bank and accumulator, is not negative edge
triggered, to provide read and write in same cycle. The RAM can be written into and read
from during program execution, and both its input and output is connected to the data bus
via latches. The addressing of RAM during read and write operations is done using the
‗immediate‘ field of the instruction word. In addition to the 8 bit data input and output,
the RAM has the following input signals:
 a: in std_logic_vector(7 downto 0);
 rd: in std_logic;
 wr: in std_logic;

29
 r: in std_logic;
 clk: in std_logic;
 en: in std_logic;
The ‗a‘ input is used in addressing, and the ‗rd‘ (read), ‗wr‘ (write) and ‗en‘ (enable) are
used to control the reading out and writing in of data during program execution.

4.2.5 Program Memory (ROM)


The processor incorporates a 256 byte ROM or program memory. The ROM
memory has 128 (28) registers, of 16 bit each, making a total of 256 byte ROM memory
addressed by 8 bits. The ROM memory is also synchronised using a clock. The
sensitivity of read operations is negative edge triggered, and is sensitive to read signal
from the control unit and to changes in the clock. This provision, unlike those present in
the RAM and registers which can written into or read from at any time during instruction
execution cycle, is given so that the next instruction is loaded into the decoding unit only
in the successive clock cycle and not before. The ROM cannot be written into at any time
during program execution. The instructions are to be loaded at the type of designing the
processor using HDL tools. This provision, while disadvantageous for demonstration
purposes, finds great use in industrial applications where processors control various
operations according to the program burned into them. The instructions from the ROM
are directly transferred into the decoding and control unit using a 16 bit instruction bus.
The addresses from which instructions are to be fetched are obtained from the program
counter. The various input signals in to the ROM are:
 a: in std_logic_vector(7 downto 0);
 rd: in std_logic;
 r: in std_logic;
 clk: in std_logic;
 en: in std_logic;
The ‗a‘ input is the address input, and the ‗rd‘ signal along with the ‗en‘ (enable) signal is
used to get the instructions from within the ROM into the instruction bus.

30
4.2.6 Program Counter
The program counter is used in addressing the program memory to fetch the
instructions. The program counter is an 8 bit counter with synchronised load and reset.
The counter is cleared upon reset. The counter outputs the current value of the count, and
increments it upon application of the ‗count‘ high signal. The counter also has a ‗load‘
signal. The load signal is used to load the address of instruction location in the ROM
during jumping operations. The load signal causes the internal signal to inherit the load
value from the ‗immediate‘ field of the instruction word, and the address is output from
the counter on the next ‗count‘ signal high. The program counter is negative edge
triggered. The program counter is checked for working and timing accuracy as the proper
functioning of the counter is central to the timing of the processor instruction execution.

4.2.7 Databus
In computer architecture, a bus is a subsystem that transfers data between
computer components inside a computer or between computers. Unlike a point-to-point
connection, a bus can logically connect several peripherals over the same set of wires.
Each bus defines its set of connectors to physically plug devices, cards or cables
together.Early computer buses were literally parallel electrical buses with multiple
connections, put the term is now used for any physical arrangements that provides the
same logical functionality as a parallel electrical bus. Modern computer buses can use
both parallel and bit serial connections and can be wired in either a multi-drop (electrical
parallel) or daisy chain topology, or connected by switched hubs, as in the case USB.
At one time, ―bus‖ meant an electrically parallel system, with electrical
conductors similar or identical to the pins on the CPU. This is no longer the case, and
modem system is blurring the lines between buses and networks. Buses can be parallel
buses, which carry data words parallel on multiple wires, or serial buses, which carry data
in bit-serial form. The addition of extra power control connection, differential drivers and
data connections in each direction usually means that most serial buses have more
conductors than the minimum of one used in the I-Wire serial bus. As data rates increase,
the problems of timing skew, power consumption, electromagnetic interference and cross
talk across parallel buses become more and more difficult to circumvent. One partial

31
solution to this problem has been double pump the bus. Often, a serial bus can actually be
operated at high overall data rates than a parallel bus, despite having fewer electrical
connections, because a serial bus inherently has no timing skew or cross talk. USB, Fire
Wire and serial ATA are examples of this. Multi-drop connections do not work well for
fast serial buses, so most modern serial buses use daily-chain or hub designs.
Most computers have both internal and external buses. An internal bus connects
all the internal components of a computer to the mother board (and thus, the CPU and
internal memory). These types of buses are also referred to as a local bus, because they
are intended to connect to local devices, not to those in other machines or external to the
computer. An external bus connects external peripherals to the mother board. Network
connections such as Ethernet are not generally regarded as buses, although the difference
in largely conceptual rather than practical. The arrival of technologies such as InfiniBand
and HyperTransport is further blurring the boundaries between networks and buses.
Every lines between internal and external are sometimes fuzzy, PC can be used as both
internal bus or an external bus (where it is known as ACCESS bus), and InfiniBand is
replaced with internal buses like PCI as well as external ones like Fiber Channel.
The processor has an 8 bit databus. The databus is simply a signal, which can be
defined in VHDL as a global signal, or with the help of Xilinx ISE schematic editor, it
can simply be drawn as a connector. The databus terminates in the output port, and is
connected to all the registers and memory banks. Care must be taken while connecting
the databus to the various components, such that each component output and input is
latched, and is connected to the databus through a high impedance buffer. Lack of such a
buffer causes the value of the data in the bus to be ‗undefined‘ during simulation, due to
multiple sources. During simulation, the Xilinx tool automatically uses wired OR gates
while connecting the components to the databus, when it detects the presence of multiple
input and output terminals for the same signal.

4.2.8 Address Bus


An address bus is a computer bus, used by CPUs or DMA- capable units for
communicating the physical address of computer memory elements/locations that the
requesting unit wants to access (read/write).The width of an address bus, along with the

32
size of addressable memory elements, determines how much memory can be accessed.
For example, a 16 bit wide address bus( commonly used in the 8 bit processors of the
1970s and early 1980s) reaches across 2=65,536=64kb memory location, where as a 32
bit address bus (common in PC processors as of 2004) can address
232=4,294,967,296=4GB location. In most microcomputers such addressable ―location‖
is 8 bytes. In such a case, the above examples translate to 64kibibytes (KiB) and 4
gigabytes (GiB) respectively.
The address bus of our processor is simply a global signal. It is defined using schematic
editor, just like the databus. There are 2 address buses for our processor which is based
on Harvard architecture. The program address bus is connected to the ROM or program
memory and the program counter, and the data address bus is connected between the
control unit and the RAM or data memory. Both the buses are 8 bits wide.

4.2.9 Output Port


Memory-mapped I/O (MMIO) and port I/O (also called port-mapped or PMIO)
are two complementary methods of performing input/output between the CPU and
peripheral devices in a computer. Another method is using dedicated I/O processors
commonly known as channels on mainframe computers that execute their own
instructions.
Memory-mapped I/O (not to be confused with memory-mapped file I/O) uses the
same address bus to both memory and I/O devices and the CPU instructions used to
access the memory are also used for accessing devices. In order to accommodate the I/O
devices, areas of the CPU‘s addressable space must be reserved for I/O. The reservation
might be temporary - the Commodore 64 could bank switch between its I/O devices and
regular memory or permanent. Each I/O device monitors the CPU‘s address bus and
responds to any of the CPU‘s access of device assigned address space, connecting the
data bus to a desirable device‘s hardware register.
Port-mapped I/O uses a special class of CPU instructions specifically for
performing I/O. This is generally found on Intel microprocessors, specifically the In and
OUT instructions which can read and write one to four bytes (outb, outw, outl) to an I/O
device. I/O devices have a separate address space from general memory either

33
accomplished by an extra ―I/O‖ pin on the CPU‘s physical interface or an entire bus
dedicated to I/O.A device‘s direct memory access (DMA) is not affected by those CPU-
to-device communication methods; especially it is not affected by memory mapping. This
is because by definition, DMA is a memory-to-device communication method that
bypasses the CPU.
Hardware interrupt is yet another communication method between CPU and
peripheral devices. However, it is always treated seperately for a number of reasons. It is
device-initiated, as opposed to the methods mentioned above, which are CPU-initiated. It
is also unidirectional, as information flows only from device to CPU. Lastly, each
interrupt line carries only one bit of information with a fixed meaning namely ―there is an
interrupt‖.
The main advantage of using port-mapped I/O is on CPUs with a limited
addressing capability. Because port-mapped I/O seperates I/O access from memory
access, the full address space can be used for memory. It is also obvious to a person
reading an assembly language program listing (or even in rare instances analyzing
machine language) when I/O is being performed, due to special instructions that can only
be used for that purpose. The advantage of using memory-mapped I/O is that, by
discarding the extra complexity that port I/O brings, a CPU requires less internal logic
and is thus cheaper, faster, easier to build, consumes less power and can be physically
smaller; this follows the basic tenets of reduced instruction set computing and is also
advantageous in embedded systems. As 16-bit processors have become obsolete and
replaced with 32-bit and 64-bit in general use, reserving ranges of memory address space
for I/O is less of problem. The fact that regular memory instructions are used to address
devices also means that all of the CPUs addressing modes are available for the I/O as
well as the memory.
Memory-mapped I/O hogs the address and data buses as usually the mapped
device is slower than main memory. Port-mapped I/O doesn‘t, if it operates via a
dedicated I/O bus.
The processor has one output port. It is defined as an ouput marker connected to
the databus during final core design using schematic editor. The output port of the FPGA

34
is mapped during the implementation stage to the appropriate pins of the piggy back
board, for display using LEDs.

4.2.10 Decoding and Control Unit


A control unit is part of a CPU or other device that directs its operation. The
outputs of the unit control the activity of the rest of the device. A control unit can be
thought of a finite state machine. The control unit is the circuitry that controls the flow of
data through the processor, coordinates the activities of the other unit within it. In a way,
it is the ―brain within the brain‖, as it controls what happens inside the processor, which
in turn controls the rest of the PC. The instruction register output, ALU flags and external
control signals constitute the input of the controller. The outputs of the controller are 38
control signals going to the data path and a shadow output that indicates that the
controller in handling a shadow instruction.They are often implemented as a micro
program that is stored in a control store. Words of the micro program are selected by a
micro sequencer and the bits from those words directly control the different part of the
device, including the registers, arithmetic and logic units, instruction register buses and
off-chip input/output.
The decoding and control unit of our processor is the heart of all operations. It
generates the count, of the program counter, and controls the reading of instructions from
the ROM, and decodes all the instructions to give rise to appropriate control signals.
The control unit inputs, as shown, are the clock, reset, the instruction word and
the carry and zero flags. The various control signals generated by the control signal,
besides the dat_out signal, which is an 8 bit data signal corresponding to the ‗immediate‘
field in the instruction word are:
 rom_rd : out STD_LOGIC;
 ram_rd:out STD_LOGIC;
 ram_wr : out STD_LOGIC;
 ram_addr : out STD_LOGIC_VECTOR (7 downto 0);
 ram_en : out STD_LOGIC;
 rom_en :out STD_LOGIC;
 prg_load : out STD_LOGIC;

35
 prg_count : out STD_LOGIC;
 r_count : out STD_LOGIC;
 acc_rd : out STD_LOGIC;
 acc_wr : out STD_LOGIC;
 B_rd : out STD_LOGIC;
 B_wr : out STD_LOGIC;
 C_rd : out STD_LOGIC;
 C_wr : out STD_LOGIC;
 D_rd : out STD_LOGIC;
 D_wr : out STD_LOGIC;
 and_enable:out std_logic;
 or_enable:out std_logic;
 not_enable:out std_logic;
 xor_enable: out std_logic;
 shiftleft_enable: out std_logic;
 shiftright_enable:out std_logic;
 add_enable:out std_logic;
 subtract_enable:out std_logic;
 multiply_enable:out std_logic;
 compare_enable:out std_logic;
The control unit is sensitive to clock, and is negative edge triggered. The simulation and
testing of instruction decoding was done by giving different input instruction words to the
controller, and studying the output signals generated.

4.3 Instruction format


The instruction of our processor is 16 bits long or of one word length. There are
two types of instructions. The instructions that involve data transfer between two
registers, are specified using a four bit opcode, and the remaining instructions are
specified using a five bit opcode. This enables the number of instructions supported by
the instruction word to increase considerably, and therefore a number of instructions can

36
be added to our processor design than what we have currently developed. The
instructions are directly fed into the control unit, which decodes them according to the
format and type of the instruction word. The following figures give the instruction
formats of both types of instructions.

15 8 7 0

opcode source dest


immediate

Fig 4.3 (a) type 1 instruction


15 0
8 7

dest
opcode immediate

Fig 4.3 (b) type 2 instruction

Type 1 instruction: The first type of instructions use both a source register and a
destination register. Each of these registers is specified using 2 bits of the instruction
word, since there are 4 registers including the accumulator. The source register is
specified in ins(11 downto 10) and destination register is specified in ins(9 downto 8).
The opcode of this type of instruction is only 4 bits long and is specified in ins(15 downto
12). Currently only one instruction is there in our processor of this type. The last 8 bits of
the instruction is the ‗immediate‘ data field.
Type 2 instructions: The second type of instructions has only the destination register to be
specified. The destination register is specified in ins(9 downto 8). The opcode in this type

37
of instruction is 5 bits long and specified in ins(15 downto 11). The last 8 bits are again
used in immediate data specification.

4.4 Instruction set


The instruction set of the processor is given in TABLE 4.1. There are basically
three types, data transfer, arithmetic and logical, and machine control. Out of the three
types, we were able to successfully simulate the execution of the data transfer type, and
the machine control type. The arithmetic instructions were simulated with the ALU alone,
but connection to the accumulator and register bank posed a lot of timing issues.

4.5 CPU core design


The core of the CPU was designed using the schematic editor of Xilinx ISE suite.
The advantage of using schematic editor is that it saves us the time to port map the
various components during structural modeling of the core. The CPU core was first
developed and tested in parts, and then the entire core was designed. The CPU core is
shown in APPENDIX A, along with all the components.

38
TABLE 4.1 INSTRUCTION SET

Instruction mnemonic and ins 15:0 description


definition
MOV move register 1111-Rs-Rd-xxxxxxxx Rd<=Rs

MVI move immediate 00001-x-Rd-data Rd<=immediate data

LDA load accumulator 00010-x-00-ram_addr acc<=ram_addr content

STA store accumulator 00011-x-00-ram_addr ram_addr content<=acc

LXI load register 00100-x-Rd-ram_addr Rd<=ram_addr content

ADD add register 00101-x-Rd-xxxxxxxx add Rd,acc

SUB sub register 00110-x-Rd-xxxxxxxx sub acc,Rd

MUL multiply register 00111-x-Rd-xxxxxxxx multiply acc,Rd

CMP compare register 01000-x-Rd-xxxxxxxx compare acc,Rd

AND and register 01001-x-Rd-xxxxxxxx and acc,Rd

XOR xor register 01010-x-Rd-xxxxxxxx xor acc,Rd

CMA not accumulator 01011-x-00-xxxxxxxx compliment accumulator

SRA shift right acc by 1 01100-x-00-xxxxxxxx shift to right acc contents by 1

SLA shift left acc by 1 01101-x-00-xxxxxxxx shift to left acc contents by 1

ORA or register 01110-x-Rd-xxxxxxxx or acc,Rd

OUT Output 01111-x-xx-xxxxxxxx Output acc contents

NOP no action 00000-x-xx-xxxxxxxx no processing

39
CHAPTER 5: SIMULATION AND TESTING

5.1 Xilinx ISE 10.1 Overview

Design Overflow Review


The following steps are involved in the realization of a digital system using Xilinx
FPGAs, as illustrated by the following figure.

Fig 5.1 design overflow review

1. Design Entry
The first step is to enter y our design. This can be done by creating ―Source‖ files.
Source files can be created in different formats such as a schematic, or a Hardware
Description Language (HDL) such as VHDL, Verilog or ABEL. A project design will

40
consist of a top-level source file and various lower-level source files. Any of these files
can be either a schematic or a HDL file.
2. Design Synthesis
The synthesis step creates netlist files from the various source files. The netlist
files can serve as input to the implementation module.
3. Design Verification (simulation)
This is an important step that should be done at various stages of the design. The
simulator is used to verify the functionality of a design (functional simulation), the
behavior and the timing (timing simulation) of your circuit. Timing simulation is run
after implementing your circuit in the FPGA since it needs to know the actual
placement and routing to find out the exact speed and timing of the circuit.
4. Design Implementation
After generating the netlist file (synthesis step), the implementation will convert
the logic design into a physical file that can be downloaded on the target device
(e.g. Virtex FPGA). This steps involves three sub-steps: Translating the netlist,
Mapping and Place&Route.

Graphical Environment overview


Sources Window
This window contains the design source files for a project. These are the source
files that you created or added to the project. A drop down list at the top of sources
window allows you to select source files that are associated with a particular design
aspect such as Synthesis/Implementation or Simulation.
Processes Window
The processes windows list the available processes (corresponding to the process
selected in the processes window). Typically you will select a particular process that you
want to perform on the selected source file. This can include a simulation,
implementation, etc. To run a process you can double click on the process. When a
process has been successfully executed a red tick-off icon appears. When you run a high-
level process, the Project Navigator will automatically run all the associated lower-level
processes.

41
Fig 5.2 Xilinx 10.1 Graphical Environment
5.2 XILINX Isim simulator / waveform editor
ISE Simulator / Waveform Editor can be used to create and simulate test bench
and test fixture within the Project Navigator framework. Waveform Editor can be used to
graphically enter stimuli and the expected response, then generate a VHDL test bench or
Verilog test fixture.
Creating a Test Bench Waveform Using the Waveform Editor:
To create a test bench with the ISE Simulator Waveform Editor:
Select time_cnt in the Sources tab.
i. Select Project > New Source.
ii. In the New Source Wizard, select Test Bench Waveform as the source type.
iii. Type time_cnt_tb.
iv. Click Next.

42
v. In the Select dialog box, the time_cnt file is the default source file because it
is selected int he Sources tab (step 1).
vi. Click Next.
vii. Click Finish.
The Waveform Editor opens in ISE. The Initialize Timing dialog box
displays, and enables to specify the timing parameters used during simulation.
The Clock Time High and Clock Time Low fields together define the clock
period for which the design must operate. The Input Setup Time field defines
when inputs must be valid. The Output Valid Delay field defines the time after
active clock edge when the outputs must be valid.
viii. In the Initialize Timing dialog box, the fields can be filled according to our
needs.
Given below is an example:
♦ Clock Time High: 10
♦ Clock Time Low: 10
♦ Input Setup Time: 5
♦ Output Valid Delay: 5
ix. Select the GSR (FPGA) from the Global Signals section.
x. 10. Change the Initial Length of Test Bench to 3000.
xi. 11. Click Finish.
Applying Stimulus
In the Waveform Editor, in the blue cell, we can apply a transition (high/low).
The width of this cell is determined by the Input setup delay and the Output valid delay.
Enter the following input stimuli:
1. Click the CE cell at time 110 ns to set it high (CE is active high).
2. Click the CLR cell at time 150 ns to set it high.
3. Click the CLR cell at time 230 ns to set it low.
4. Click the Save icon in the toolbar.
The new test bench waveform source (time_cnt_tb.tbw) is automatically added to
the project.
5. Select time_cnt_tb.tbw in the Sources tab.

43
6. Double-click Generate Self-Checking Test Bench in the Process tab.
A test bench containing output data and self checking code is generated and added
to the project. The created test bench can be used to compare data from later simulation.

fig 5.3 Waveform Editor - Initialize Timing Dialog Box

Behavioral Simulation Using ISE Simulator


Since we have a test bench in our project, we can perform behavioral simulation
on the design using the ISE Simulator.ISE has full integration with the ISE Simulator.
ISE enables ISE Simulator to create the work directory, compile the source files, load the
design, and perform simulation based on simulation properties.
To select ISE Simulator as the project simulator:
1. In the Sources tab, right-click the device line.

44
2. Select Properties.
3. In the Project Properties dialog box, select ISE Simulator in the Simulator field.

Fig 5.4 Applying stimulus in waveform editor

Locating the Simulation Processes


The simulation processes in ISE enable us to run simulation on the design using
ISE Simulator. To locate the ISE Simulator processes:
1. In the Sources tab, select Behavioral Simulation in the Sources for field.
2. Select the test bench file (stopwatch_tb).
3. Click the + beside Xilinx ISE Simulator to expand the process hierarchy.
The following simulation processes are available:
i. Check Syntax
This process checks for syntax errors in the test bench.
ii. Simulate Behavioral Model
This process starts the design simulation.
iii. Generate a self-checking HDL test bench
This process enables to generate a self-checking HDL test bench equivalent to a

45
test bench waveform (TBW) file and add the test bench to the project. You can also use
this process to update an existing self-checking test bench. The test bench generated by
this process contains output data and self-checking code that can be used to compare the
data from later simulation runs.
Specifying Simulation Properties
The behavioral simulation will be performed on the stopwatch design after setting
some process properties for simulation.ISE allow setting several ISE Simulator properties
in addition to the simulation netlist properties. To see the behavioral simulation
properties, and to modify the properties for this example:
1. In the Sources tab, select the test bench file (stopwatch_tb).
2. Click the + sign next to ISE Simulator to expand the hierarchy in the Processes
tab.
3. Right-click the Simulate Behavioral Model process.
4. Select Properties.
5. In the Process Properties dialog box set the Property display level to
Advanced. This global setting enables to see all available properties.
6. Change the Simulation Run Time to 2000 ns.
7. Click Apply and click OK.
The process properties window is shown in Fig 5.5
Performing Simulation
Once the process properties have been set, the ISE Simulator can be run. To start
the behavioral simulation, double-click Simulate Behavioral Model. ISE Simulator
creates the work directory, compiles the source files, loads the design, and performs
simulation for the time specified. The majority of this design runs at 100 Hz and would
take a significant amount of time to simulate. The first outputs to transition after RESET
is released are SF_D and LCD_E at around 33 mS. This is why the counter may seem
like it is not working in a short simulation. For the purpose of this tutorial, only the DCM

signals are monitored to verify that they work correctly.

46
Fig 5.5 Process properties for ISE simulator

Adding Signals
To view signals during the simulation, you must add them to the Waveform
window. ISE automatically adds all the top-level ports to the Waveform window.
Additional signals are displayed in the Sim Hierarchy window. The following procedure
explains how to add additional signals in the design hierarchy. For the purpose of this
tutorial, add the DCM signals to the waveform.
To add additional signals in the design hierarchy:
1. In the Sim Hierarchy window, click the + next to stopwatch_tb to expand the
hierarchy.
2. Click the + next to uut stopwatch to expand the hierarchy

47
3. Click the + next to dcm_inst in the Sim Instances tab.
4. Click and drag CLKIN_IN from the Sim Objects window to the Waveform window.
5. Select the following signals:
♦ RST_IN
♦ CLKFX_OUT
♦ CLK0_OUT
♦ LOCKED_OUT
To select multiple signals, hold down the Ctrl key.
6. Drag all the selected signals to the waveform. Alternatively, right click on a selected
signal and select Add To Waveform.
By default, ISE Simulator records data only for the signals that have been added
to the waveform window while the simulation is running. Therefore, when new signals
are added to the waveform window, we must rerun the simulation for the desired amount
of time.
Analyzing the Signals
Now the DCM signals can be analyzed to verify that they work as expected. The
CLK0_OUT should be 50 MHz and the CLKFX_OUT should be ~26 MHz . The DCM
outputs are valid only after the LOCKED_OUT signal is high; therefore, the DCM
signals are analyzed only after the LOCKED_OUT signal has gone high.
ISE Simulator can add markers to measure the distance between signals. To measure the
CLK0_OUT:
1. If necessary, zoom in on the waveform.
2. Click the Measure Marker icon.
3. Place the marker on the first rising edge transition on the CLK0_OUT signal after the
LOCKED_OUT signal has gone high.
4. Click and drag the other end of the marker to the next rising edge.
5. Look at the top of the waveform for the distance between the markers. The
measurement should read 20.0 ns. This converts to 50 MHz, which is the input
frequency from the test bench, which in turn is the DCM CLK0 output.
6. Measure CLKFX_OUT using the same steps as above. The measurement should read
38.5 ns. This equals approximately 26 MHz.Now the behavioral simulation is complete

48
5.3 Simulation of components
The simulation of the components of the processor was first done
separately to check for operational timing accuracy. The simulation was done using the
clock period of 100ns, and the input and output setup times were put as 0ns. This 0ns
time delay in output and input availability, will prove to be a serious timing issue during
implementation stage, but for the simulation purpose and verification or working, we
have chosen this delay to be 0. The following section describes the simulation waveforms
of all the components. The simulation waveforms of the components and instructions are
given in APPENDIX B.

49
CHAPTER 6: APPLICATIONS IN INDUSTRY

i. As NGCP
A space vehicle can have a very complex motion which might seem much
difficult to explain. However the motion of any rigid body can be considered to be the
combination of translational and rotational motion. By considering the three dimensional
space a translational motion can be considered to be a movement which can be resolved
into components along one or more of the three axes. A rotation can be considered as a
rotation which has components rotating about one or more of the axes. To control all
these motions a special processor called a Navigation Guidance Control Processor. From
the name itself its quiet understood that it controls and guides the processor. The several
rotational and translational motion of the space probe at different altitudes will be
different. Its pre-programmed and controlled by a processor. There are certain
specifications for such control and guidance processors. It should be having well
controlled and very few number of interrupts compared to the commercial processors.
Even a single interrupt can lead to a mass destruction. The criteria for NGCP are:
1. All the processors should be of military standard MIL-STD-462
2. The interrupts have to be controlled
3. Enabling of all interrupts damages the machine
4. Only specified interrupts are allowed to work
5. Even though commercial processors have better speed and efficiency, they were
not considered since all the interrupts will be enabled in such processors.
One such NGCPs indegeniously developed by Vikram Sarabhai is
―VIKRAM‖ processor which is solely dedicated as NGCP. Another such a processor
is SAYEH(Simple All Yet Enough Hardware) processor which can be either used as
NGCP or FPGA or ASIC. But the primary need is as NGCP.

ii. As FPGA
Before the advent of programmable logic, custom logic circuits were built at the
board level using standard components or at the gate level in expensive application
specific (custom) integrated circuits. The FPGA is an integrated circuit that contains

50
many (64 to over 10000) identical logic cells that can be viewed as standard components.
Each logic cell can independently take on any one of a limited set of personalities. The
individual cells are interconnected by a matrix of wires and programmable switches. A
user‘s design is implemented by specifying the simple logic function for each cell and
selectively closing the switches in the interconnect matrix. The array of logic cells and
interconnect form a fabric of basic building blocks for logic circuits. Complex designs are
created by combining these basic blocks to create the desired circuit.
Field Programmable Gate Arrays are two dimensional array of logic blocks and
flip-flops with a electrically programmable interconnections between logic blocks. The
interconnections consist of electrically programmable switches which is why FPGA
differs from Custom ICs, as Custom IC is programmed using integrated circuit
fabrication technology to form metal interconnections between logic blocks. FPGAs can
be used to implement just about any hardware design. One common use is to prototype a
system that will eventually find its way into an ASIC.
FPGAs comprises an array of uncommitted circuit elements called combinational
logic blocks and interconnect resources, but FPGA configuration is performed through
programming by the end user. FPGAs have been responsible for a major shift in a way
digital circuits are designed.
There are two basic categories of FPGAs in the market today :
1. SRAM based FPGAs
2. Anti fuse based FPGAs
The SRAM based FPGAs are multi-programmable where as anti fuse based FPGAs are
one time programmable.
Anti fuses are originally open circuits and take on low resistance only when
programmed. Anti fuses are suitable for FPGAs because it can be built using modified
CMOS technology. As an example, Actel‘s anti fuse structure, known as PLICE is
depicted in figure.Applications of FPGAs include digital signal processor (DSP),
software-defined radio, aerospace and defense systems, ASIC proto typing, medical
imaging, computer vision, speech recognition, cryptography, bio informatics, computer
hardware emulation and a growing range of other areas.

51
iii. As ASIC
An ASIC is an Application Specific Integrated Circuit. With the advent of VLSI
in the 1980s engineers began to realize the advantage of designing an IC that was
customized or tailored to a particular system or application rather than using standard ICs
alone. Microelectronic system design then becomes a matter of designing the functions
that you can implement using standard ICs and then implementing the remaining logic
functions with one or more custom ICs. Types of ASICs are:
1. Full custom ASIC
2. Semi custom ASIC

52
.CHAPTER 7: CONCLUSION

Microprocessors have evolved from the obsolete and crude calculating machines
they were at the time of their genesis, to highly capable and fast controllers, which can be
programmed to meet almost all the needs of this age of automated production and
maintenance. The task of designing a processor, has simplified down from drawing the
actual circuit by hand, to the use of HDL languages, and now the use of powerful
simulation and design tools like XILINX, making a processor has become much more of
an easier task.
We were able to successfully simulate all the arithmetic and logic as well as data
transfer instructions of the processor. In this process, the timing advantages offered by
the modified Harvard architecture, over the original Von-Neumann architecture were
significant. All the instructions took at the most two clock cycles to execute.
This project has been very intense and involved. Throughout the development, we
were able to encounter and understand the various issues involved with the designing of
an actual processor. The working intricacies of the various modules, the timing issues and
their solutions, were also understood by us. The main difficulties we faced were in the
timing front, were the working of the whole processor unit with one synchronizing clock
proved to be more complex and haphazardly than we contemplated. Although all the
timing issues were not solved, we were able to solve a few of the problems, using our
own techniques. Whether these solutions will work in a real world FPGA application
remains to be seen. Nevertheless, we were able to learn to use Xilinx tools for design,
simulation and implementation of a hardware model on an FPGA.
VLSI design is one of the leading industries in the semiconductor market today.
In this computer controlled world, almost everything and anything in the industrial
domain envisages the need for a processor for control. By designing this processor, we
were able to familiarize ourselves with the various stages involved in designing a custom
made IC for a user defined purpose.

53
APPENDICES

54
APPENDIX A: CPU CORE

55
APPENDIX B: SIMULATION WAVEFORMS

1. MVI A,95H and MOV C,A

The value 95H is first written into A using MVI instruction, and then the contents of A
are moved into C using MOV C,A instruction.

56
2. LDA 0AH

The RAM address 0AH contains the value 12H. This value is loaded into the accumulator
using the instruction LDA 0AH.

57
3. STA 04H

The accumulator is first stored with the value 2A using MVI instruction. The contents of
A are then written into the RAM address 04H using the STA instruction.

58
4. LXI C, 04H

The RAM address 04H contains 0CH. It is stored into the C register using the LXI
instruction.

59
5. ADD A, C

03H is written into C register, 04H into A register, and then the contents of C are added
with A. Finally the contents of A are moved into B register.

60
6. SUB A, C

03H is written into C register, 04H into A register, and then the contents of C are
subtracted from A. Finally the contents of A are moved into B register.

61
7. MUL A, C

03H is written into C register, 04H into A register, and then the contents of C are
multiplied with A. Finally the contents of A are moved into B register.

62
8. AND A, C

03H is written into C register, 04H into A register, and then the contents of C are ANDed
with A. Finally the contents of A are moved into B register.

63
9. XOR A, C

03H is written into C register, 04H into A register, and then the contents of C are XORed
with A. Finally the contents of A are moved into B register.

64
10. CMA

Here, the accumulator is stored with the value 03H using MVI A,03H , and then the value
is complemented, giving the result FCH. The contents of A are then moved to B.

65
11. SLA

Here, the accumulator is stored with the value 03H using MVI A,03H , and then the value
is shifted to the left by 1, giving the result 06H. The contents of A are then moved to B.

66
12. SRA

Here, the accumulator is stored with the value 03H using MVI A,03H , and then the value
is shifted to the right by 1, giving the result 01H. The contents of A are then moved to B.

67
13. ORA C

03H is written into C register, 04H into A register, and then the contents of C are ORed
with A. Finally the contents of A are moved into B register.

68
REFERENCES

i. “Microprocessor architecture, programming and applications with 8085” by


Ramesh Gaonkar
ii. “VHDL- Analysis and modeling of digital systems” by Zainalabedin Nawabi
iii. Xilinx.com – tutorials
iv. ―Digital logic and microprocessor design with VHDL” by Enoch. O .Hwang
v. “VHDL tutorial” by Peter.J.Ashenden
vi. ―A VHDL Primer” by J.Bhaskar
vii. Dalton.edu : 8051 implementation in VHDL
viii. Wikipedia.org : Harvard Architecture Vs Von-Neumann architecture

69

You might also like