You are on page 1of 46

Digital Design:

An Embedded Systems
Approach Using Verilog
Chapter 7
Processor Basics
Portions of this work are from the book, Digital Design: An Embedded
Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan
Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.
Verilog
Digital Design Chapter 7 Processor Basics 2
Embedded Computers
A computer as part of a digital system
Performs processing to implement or control the
systems function
Components
Processor core
Instruction and data memory
Input, output, and input/output controllers
For interacting with the physical world
Accelerators
High-performance circuit for specialized functions
Interconnecting buses
Verilog
Digital Design Chapter 7 Processor Basics 3
Memory Organization
Von Neumann architecture
Single memory for instructions and data
Harvard architecture
Separate instruction and data memories
Most common in embedded systems
CPU

Accelerator
Instruction
memory
Input
controller
Output
controller
I/O
controller
Data
memory
Verilog
Digital Design Chapter 7 Processor Basics 4
Bus Organization
Single bus for low-cost low-performance
systems
Multiple buses for higher performance
CPU
Accelerator
Instruction
memory
Input
controller
Output
controller
I/O
controller
Data
memory
Verilog
Digital Design Chapter 7 Processor Basics 5
Microprocessors
Single-chip processor in a package
External connections to memory and
I/O buses
Most commonly seen in general
purpose computers
E.g., Intel Pentium family, PowerPC,

Verilog
Digital Design Chapter 7 Processor Basics 6
Microcontrollers
Single chip combining
Processor
A small amount of instruction/data memory
I/O controllers
Microcontroller families
Same processor, varying memory and I/O
8-bit microcontrollers
Operate on 8-bit data
Low cost, low performance
16-bit and 32-bit microcontrollers
Higher performance
Verilog
Digital Design Chapter 7 Processor Basics 7
Processor Cores
Processor as a component in an FPGA or
ASIC
In FPGA, can be a fixed-function block
E.g., PowerPC cores in some Xilinx FPGAs
Or can be a soft core
Implemented using programmable resources
E.g., Xilinx MicroBlaze, Altera Nios-II
In ASIC, provided as an IP block
E.g., ARM, PowerPC, MIPS, Tensilica cores
Can be customized for an application
Verilog
Digital Design Chapter 7 Processor Basics 8
Digital Signal Processors
DSPs are processors optimized for
signal processing operations
E.g., audio, video, sensor data; wireless
communication
Often combined with a conventional
core for processing other data
Heterogeneous multiprocessor
Verilog
Digital Design Chapter 7 Processor Basics 9
Instruction Sets
A processor executes a program
A sequence of instructions, each performing a
small step of a computation
Instruction set: the repertoire of available
instructions
Different processor types have different instruction
sets
High-level languages: more abstract
E.g., C, C++, Ada, Java
Translated to processor instructions by a compiler
Verilog
Digital Design Chapter 7 Processor Basics 10
Instruction Execution
Instructions are encoded in binary
Stored in the instruction memory
A processor executes a program by
repeatedly
Fetching the next instruction
Decoding it to work out what to do
Executing the operation
Program counter (PC)
Register in the processor holding the
address of the next instruction
Verilog
Digital Design Chapter 7 Processor Basics 11
Data and Endian-ness
Instructions operate on data from the data memory
Byte: 8-bit data
Data memory is usually byte addressed
16-bit, 32-bit, 64-bit words of data
0
least sig. byte
Little endian
Big endian
8-bit data
16-bit data
32-bit data
most sig. byte
least sig. byte
most sig. byte
m
m + 1
n
n + 2
n + 3
n + 1
0
least sig. byte
8-bit data
16-bit data
32-bit data
most sig. byte
least sig. byte
most sig. byte
m
m + 1
n
n + 2
n + 3
n + 1
Verilog
Digital Design Chapter 7 Processor Basics 12
The Gumnut Core
A small 8-bit soft core
Can be used in FPGA designs
Instruction set illustrates features typical of 8-
bit cores and processors in general
Programs written in assembly language
Each processor instruction written explicitly
Translated to binary representation by an
assembler
Resources available on companions web site
Verilog
Digital Design Chapter 7 Processor Basics 13
Gumnut Storage
Verilog
Digital Design Chapter 7 Processor Basics 14
Arithmetic Instructions
Operate on register data and put result
in a register
add, addc, sub, subc
Can have immediate value operand
Condition codes
Z: 1 if result is zero, 0 if result is non-zero
C: carry out of add/addc, borrow out of
sub/subc
addc and subc include C bit in
operation
Verilog
Digital Design Chapter 7 Processor Basics 15
Arithmetic Instructions
Examples
add r3, r4, r1
add r5, r1, 2
sub r4, r4, 1
Evaluate 2x + 1; x in r3, result in r4
add r4, r4, r3 ; double x
add r4, r4, 1 ; then add 1
Verilog
Digital Design Chapter 7 Processor Basics 16
Logical Instructions
Operate on register data and put result
in a register
and, or, xor, mask (and not)
Operate bitwise on 8-bit operands
Can have immediate value operand
Condition codes
Z: 1 if result is zero, 0 if result is non-zero
C: always 0
Verilog
Digital Design Chapter 7 Processor Basics 17
Logical Instructions
Examples
and r3, r4, r5
or r1, r1, 0x80 ; set r1(7)
xor r5, r5, 0xFF ; invert r5
Set Z if least-significant 4 bits of r2 are 0101
and r1, r2, 0x0F ; clear high bits
sub r0, r1, 0x05 ; compare with 0101
Verilog
Digital Design Chapter 7 Processor Basics 18
Shift Instructions
Logical shift/rotate register data and
put result in a register
shl, shr, rol, ror
Count specified as a literal operand
Condition codes
Z: 1 if result is zero, 0 if result is non-zero
C: the value of the last bit shifted/rotated
past the end of the byte
Verilog
Digital Design Chapter 7 Processor Basics 19
Shift Instructions
Examples
shl r4, r1, 3
ror r2, r2, 4
Multiply r4 by 8, ignoring overflow
shl r4, r4, 3
Multiply r4 by 10, ignoring overflow
shl r1, r4, 1 ; multiply by 2
shl r4, r4, 3 ; multiply by 8
add r4, r4, r1
Verilog
Digital Design Chapter 7 Processor Basics 20
Memory Instructions
Transfer data between registers and data
memory
Compute address by adding an offset to a base
register value
Load register from memory
ldm r1, (r2)+5
Store from register to memory
stm r1, (r4)-2
Use r0 if base address is 0
ldm r3, 23 ldm r3, (r0)+23
Condition codes not affected
Verilog
Digital Design Chapter 7 Processor Basics 21
Memory Instructions
Increment a 16-bit integer in memory
Little-endian: address of lsb in r2, msb in next
location
ldm r1, (r2) ; increment lsb
add r1, r1, 1
stm r1, (r2)
ldm r1, (r2)+1 ; increment msb
addc r1, r1, 0 ; with carry
stm r1, (r2)+1
Verilog
Digital Design Chapter 7 Processor Basics 22
Input/Output Instructions
I/O controllers have registers that govern
their operation
Each has an address, like data memory
Gumnut has separate data and I/O address spaces
Input from I/O register
inp r3, 157 inp r3, (r0)+157
Output to I/O register
out r3, (r7) out r3, (r7)+0
Condition codes not affected
Further examples in Chapter 8
Verilog
Digital Design Chapter 7 Processor Basics 23
Branch Instructions
Programs can evaluate conditions and take
alternate courses of action
Condition codes (Z, C) represent outcomes of
arithmetic/logical/shift instructions
Branch instructions examine Z or C
bz, bnz, bc, bnc
Add a displacement to PC if condition is true
Specifies how many instructions forward or
backward to skip
Counting from instruction after branch
Verilog
Digital Design Chapter 7 Processor Basics 24
Branch Example
Elapsed seconds in location 100
Increment, wrapping to 0 after 59
ldm r1, 100
add r1, r1, 1
sub r0, r1, 60 ; Z set if r1 = 60
bnz +1 ; Skip to store if
add r1, r0, 0 ; Z is 0
stm r1, 100
Verilog
Digital Design Chapter 7 Processor Basics 25
Jump Instruction
Unconditionally skips forward or backward to
specified address
Changes the PC to the address
Example: if r1 = 0, clear data location 100 to
0; otherwise clear location 200 to 0
Assume instructions start at address 10
10: sub r0, r1, 0
11: bnz +2
12: stm r0, 100
13: jmp 15
14: stm r0, 200
15: ...
Verilog
Digital Design Chapter 7 Processor Basics 26
Subroutines
A sequence of instructions that perform
some operation
Can call them from different parts of a
program using a jsb instruction
Subroutine returns with a ret instruction
Verilog
Digital Design Chapter 7 Processor Basics 27
Subroutine Example
Subroutine to increment second count
Address of count in r2
ldm r1, (r2)
add r1, r1, 1
sub r0, r1, 60
bnz +1
add r1, r0, 0
stm r1, (r2)
ret
Call to increment locations 100 and 102
add r2, r0, 100
jsb 20
add r2, r0, 102
jsb 20
Verilog
Digital Design Chapter 7 Processor Basics 28
Return Address Stack
The jsb saves the return address for
use by the ret
But what if the subroutine includes a jsb?
Gumnut core includes an 8-entry push-
down stack of return addresses
return addr for first call
return addr for second call
return addr for first call
return addr for second call
return addr for third call
Verilog
Digital Design Chapter 7 Processor Basics 29
Miscellaneous Instructions
Instructions supporting interrupts
See Chapter 8
reti Return from interrupt
enai Enable interrupts
disi Disable interrupts
wait Wait for an interrupt
stby Stand by in low power mode until
an interrupt occurs
Verilog
Digital Design Chapter 7 Processor Basics 30
The Gumnut Assembler
Gasm: translates assembly programs
Generates memory images for program
text (binary-coded instructions) and data
See documentation on web site
Write a program as a text file
Instructions
Directives
Comments
Use symbolic labels
Verilog
Digital Design Chapter 7 Processor Basics 31
Example Program
; Program to determine greater of value_1 and value_2
text
org 0x000 ; start here on reset
jmp main
; Data memory layout
data
value_1: byte 10
value_2: byte 20
result: bss 1
; Main program
text
org 0x010
main: ldm r1, value_1 ; load values
ldm r2, value_2
sub r0, r1, r2 ; compare values
bc value_2_greater
stm r1, result ; value_1 is greater
jmp finish
value_2_greater: stm r2, result ; value_2 is greater
finish: jmp finish ; idle loop
Verilog
Digital Design Chapter 7 Processor Basics 32
Gumnut Instruction Encoding
Instructions are a form of information
Can be encoded in binary
Gumnut encoding
18 bits per instruction
Divided into fields representing different
aspects of the instruction
Opcodes and function codes
Register numbers
Addresses
Verilog
Digital Design Chapter 7 Processor Basics 33
Gumnut Instruction Encoding
1 1 0 1 1 1 fn disp
6 2 2 8
Branch
Arith/Logical
Register
Arith/Logical
Immediate
Shift
Memory, I/O
1 1 0 1 fn rd rs rs2
4 3 3 3 3 2
0 fn rd rs immed
1 8 3 3 3
1 1 0 fn rd rs count
3 3 1 2 3 3 3
1 0 fn rd rs offset
2 2 3 3 8
1 1 1 1 0
0
fn addr
5 1 12
Jump
1 1 1 1 1 1 fn
7 3 8
Miscellaneous
Verilog
Digital Design Chapter 7 Processor Basics 34
Encoding Examples
Encoding for addc r3, r5, 24
Arithmetic immediate, fn = 001
0 fn rd rs immed
0 0 0 1 1 0 1 0 1 1 0 0 1 0 1 0 0 0
1 8 3 3 3
Instruction encoded by 2ECFC
1 1 0 1 1 1 fn disp
6 2 2 8
1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1
Branch bnc -4
05D18
Verilog
Digital Design Chapter 7 Processor Basics 35
Other Instruction Sets
8-bit cores and microcontrollers
Xilinx PicoBlaze: like Gumnut
8051, and numerous like it
Originated as 8-bit microprocessors
Instructions encoded as one or more bytes
Instruction set is more complex and irregular
Complex instruction set computer (CISC)
C.f. Reduced instruction set computer (RISC)
16-, 32- and 64-bit cores
Mostly RISC
E.g., PowerPC, ARM, MIPS, Tensilica,
Verilog
Digital Design Chapter 7 Processor Basics 36
Instruction and Data Memory
In embedded systems
Instruction memory is usually ROM, flash,
SRAM, or combination
Data memory is usually SRAM
DRAM if large capacity needed
Processor/memory interfacing
Gluing the signals together

Verilog
Digital Design Chapter 7 Processor Basics 37
Example: Gumnut Memory
inst_adr_o
inst_dat_i
rst_i
gumnut data
SRAM
inst_cyc_o
inst_stb_o
inst_ack_i
data_adr_o
data_dat_i
data_dat_o
data_cyc_o
data_stb_o
data_ack_i
data_we_o
adr
dat_o
dat_i
en
we
adr
dat_o
en
clk_i
clk_i
instruction
ROM
clk_i
D Q
clk
D Q
clk
Verilog
Digital Design Chapter 7 Processor Basics 38
Example: Gumnut Memory
always @(posedge clk) // Instruction memory
if (inst_cyc_o && inst_stb_o) begin
inst_dat_i <= inst_ROM[inst_adr_o[10:0]];
inst_ack_i <= 1'b1;
end
else
inst_ack_i <= 1'b0;
Verilog
Digital Design Chapter 7 Processor Basics 39
Example: Gumnut Memory
always @(posedge clk) // Data memory
if (data_cyc_o && data_stb_o)
if (data_we_o) begin
data_RAM[data_adr_o] <= data_dat_o;
data_dat_i <= data_dat_o;
data_ack_i <= 1'b1;
end
else begin
data_dat_i <= data_RAM[data_adr_o];
data_ack_i <= 1'b1;
end
else
data_ack_i <= 1'b0;
Verilog
Digital Design Chapter 7 Processor Basics 40
Example: Microcontroller Memory
A(15..8)
A(7..0)
CE
WE
OE
D
A(16)
D
LE
P2
Q
PSEN
ALE
8051 SRAM
RD
WR
P0
Verilog
Digital Design Chapter 7 Processor Basics 41
32-bit Memory
Four bytes per memory word
Little-endian: lsb at least address
Big-endian: msb at least address
0 1 2 3
4 5 6 7
8 9 10 11
Partial-word read
Read all bytes, processor selects those needed
Partial-word write
Use byte-enable signals
Verilog
Digital Design Chapter 7 Processor Basics 42
Example: MicroBlaze Memory
D_in
A
SSRAM
en
wr
D_out
clk
D_in
A
SSRAM
en
wr
D_out
clk
D_in
A
SSRAM
en
wr
D_out
clk
D_in
A
SSRAM
en
wr
D_out
clk
0:7
8:15
16:23
24:31
0:7
2:16
8:15
16:23
24:31
Addr
Data_Write
AS
Read_Strobe
Ready
Clk
Data_Read
Write_Strobe
Byte_Enable(0)
Byte_Enable(1)
Byte_Enable(2)
Byte_Enable(3)
+V
Verilog
Digital Design Chapter 7 Processor Basics 43
Cache Memory
For high-performance processors
Memory access time is several clock cycles
Performance bottleneck
Cache memory
Small fast memory attached to a processor
Stores most frequently accessed items,
plus adjacent items
Locality: those items are most likely to be
accessed again soon
Verilog
Digital Design Chapter 7 Processor Basics 44
Cache Memory
Memory contents divided into fixed-
sized blocks (lines)
Cache copies whole lines from memory
When processor accesses an item
If item is in cache: hit - fast access
Occurs most of the time
If item is not in cache: miss
Line containing item is copied from memory
Slower, but less frequent
May need to replace a line already in cache
Verilog
Digital Design Chapter 7 Processor Basics 45
Fast Main Memory Access
Optimize memory for line access by cache
Wide memory
Read a line in one access
Burst transfers
Send starting address, then read successive locations
Pipelining
Overlapping stages of memory access
E.g., address transfer, memory operation, data transfer
Double data rate (DDR), Quad data rate (QDR)
Transfer on both rising and falling clock edges
Verilog
Digital Design Chapter 7 Processor Basics 46
Summary
Embedded computer
Processor, memory, I/O controllers, buses
Microprocessors, microcontrollers, and
processor cores
Soft-core processors for ASIC/FPGA
Processor instruction sets
Binary encoding for instructions
Assembly language programs
Memory interfacing

You might also like