You are on page 1of 4

ARM integer cores

The 3-stage ARM pipeline

Outline:
the ARM 3-stage pipeline
the ARM7TDMI core
the ARM 5-stage pipeline
the ARM9TDMI core
the ARM10TDMI core

fetch

decode

the instruction is fetched from memory


the instruction is decoded and the datapath
control signals prepared for the next cycle

execute
the operands are read from the register
bank, shifted, combined in the ALU and the
result written back

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 1

2001 PEVEIT Unit - ARM System Design

The 3-stage ARM pipeline


fetch

execute

fetch

decode

execute

fetch

decode

3
instruction

The 3-stage ARM pipeline

decode

execute

More complex instructions:

fetch ADD decode

fetch STR

time

Cores - v4 - 2

Single cycle instructions

execute

decode

fetch ADD

complete at a rate of one per clock cycle

calc. addr. data xfer

decode

fetch ADD

execute

decode

execute

fetch ADD decode

execute

instruction
time
2001 PEVEIT Unit - ARM System Design

Cores - v4 - 3

The 3-stage ARM pipeline

2001 PEVEIT Unit - ARM System Design

3-stage ARM organization

PC behaviour

r15 increments twice before an instruction


executes

ARM components:
register bank
2 read ports, 1 write port
plus additional read and write ports for r15

due to pipeline operation

therefore r15 = address of instruction + 8


(+12 if used after first cycle, though this is
architecturally undefined)
in Thumb code the offset is +4

normally the assembler makes the


necessary adjustments, e.g. in branches
2001 PEVEIT Unit - ARM System Design

Cores - v4 - 4

Cores - v4 - 5

barrel shifter
ALU
address register and incrementer
memory data registers
instruction decoder and control

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 6

A[31:0]

control

ARM integer cores

address register
P
C

3-stage
ARM
organization

incrementer

PC
register
bank

decode

A
L
U
b
u
s

multiply
register

&

b
u
s

b
u
s

barrel
shifter

Outline:
the ARM 3-stage pipeline
the ARM7TDMI core
the ARM 5-stage pipeline
the ARM9TDMI core
the ARM10TDMI core

instruction

control

ALU

data out register

data in register
D[31:0]

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 7

The ARM7TDMI

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 8

ARM7TDMI organization

The ARM7TDMI is...

scan chain 2

an ARM7 3-stage pipeline core, with


T - support for the Thumb instruction set
D - support for debug
the processor can stop on a debug event

extern0
extern1

scan chain 0

Embedded
ICE

opc, r/w,
mreq, trans,
mas[1:0]
A[31:0]

processor
core

D[31:0]

M - support for long multiplies


I - the EmbeddedICE macrocell

Din[31:0]

provides breakpoint and watchpoint hardware

Dout[31:0]

other
signals

scan chain 1

bus
splitter

JTAG TAP
controller

described later
TCK TMSTRST TDI TDO
2001 PEVEIT Unit - ARM System Design

Cores - v4 - 9

clock
control

mclk
wait
eclk

configuration

bigend

interrupts

irq
fiq
isync

The
ARM7TDMI
core
interface
signals

initialization

reset

bus
control

enin
enout
enouti
abe
ale
ape
dbe
tbe
busen
highz
busdis
ecapclk

debug

dbgrq
breakpt
dbgack
exec
extern1
extern0
dbgen
rangeout0
rangeout1
dbgrqi
commrx
commtx

coprocessor
interface

opc
cpi
cpa
cpb

power

Vdd
Vss

2001 PEVEIT Unit - ARM System Design

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 10

A[31:0]
Din[31:0]
Dout[31:0]
D[31:0]

memory
interface

bl[3:0]
r/w
mas[1:0]
mreq
seq
lock

ARM7TDMI
core

ARM7TDMI

trans
mode[4:0]
abort

MMU
interface

Tbit

state

tapsm[3:0]
ir[3:0]
tdoen
tck1
tck2
screg[3:0]

TAP
information

ARM7TDMI debug support


the EmbeddedICE module
supports breakpoints and watchpoints
controlled via the JTAG test access port

EmbeddedICE & JTAG are covered later

drivebs
ecapclkbs
icapclkbs
highz
pclkbs
rstclkbs
sdinbs
sdoutbs
shclkbs
shclk2bs

boundary
scan
extension

TRST
TCK
TMS
TDI
TDO

JTAG
controls

Cores - v4 - 11

ARM7TDMI characteristics:

Process
Metal layers
Vdd

0.35 m
3
3.3 V

2001 PEVEIT Unit - ARM System Design

Transistors
Core area
Clock

74,209
2
2.1 mm
0 to 66 MHz

MIPS
Power
MIPS/W

60
87 mW
690

Cores - v4 - 12

ARM integer cores

ARM7TDMI

Outline:
the ARM 3-stage pipeline
the ARM7TDMI core
the ARM 5-stage pipeline
the ARM9TDMI core
the ARM10TDMI core

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 13

Getting higher performance

Increase the clock rate

Fetch
Decode

decrease the logic complexity per stage


increase the pipeline depth (number of stages)

instruction decode and register read

Execute

Memory

shift and ALU

improve the CPI (clocks per instruction)


fewer wasted cycles

data memory access

better memory bandwidth

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 15

The 5-stage ARM pipeline

Reducing the CPI

for either instruction fetch or data transfer

therefore a reduced CPI requires


more than one memory access per clock cycle

Possible solutions are:


separate instruction and data memories
double-bandwidth memory (e.g. ARM8)

2001 PEVEIT Unit - ARM System Design

Write-back

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 16

ARM9TDMI

ARM7 uses the memory on nearly every


clock cycle

Cores - v4 - 14

The 5-stage ARM pipeline

the clock rate is limited by the slowest


pipeline stage

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 17

The ARM9TDMI is
a classic Harvard architecture 5-stage
pipeline
separate instruction and data memory ports

with full support for Thumb and


EmbeddedICE debug
aimed at significantly higher performance
than the ARM7TDMI
enhanced pipeline operates at 100-200 MHz
2001 PEVEIT Unit - ARM System Design

Cores - v4 - 18

ARM9TDMI
pipeline

ARM9TDMI pipeline

next
pc

+4
I-cache

fetch

pc + 4

pc + 8

I decode
r15

ARM7TDMI:

Fetch

Decode

instruction
fetch

Thumb
decompress

Execute

ARM
decode

reg
read

reg
write

shift/ALU

ARM9TDMI:
r. read

instr uction
fetch

decode

Fetch

Decode

shift/ALU

data memor y
access

Execute

Memory

reg
write

very similar to
StrongARM

immediate
elds

mul
LDM/
STM

see CPU
section

no separate
branch adder

instruction
decode

register read

+4

postindex

reg
shift

shift

pre-index

execute

ALU

forwarding
paths

mux
B, BL
MOV pc
SUBS pc

byte repl.

Write

Thumb instructions are decoded directly

buffer/
data

D-cache

load/store
address

rot/sgn ex
LDR pc

register write

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 19

ARM9TDMI

2001 PEVEIT Unit - ARM System Design

write-back

Cores - v4 - 20

ARM9TDMI

EmbeddedICE
as ARM7TDMI, plus:
hardware single-stepping
breakpoints on exceptions

On-chip coprocessor support


for floating-point, DSP, and so on

Process
Metal layers
Vdd

0.25 m
3
2.5 V

Transistors
Core area
Clock

111,000
2
2.1 mm
0-200 MHz

MIPS
Power
MIPS/W

2001 PEVEIT Unit - ARM System Design

220
150 mW
1,500

Cores - v4 - 21

ARM10TDMI

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 22

ARM10TDMI pipeline

The ARM10TDMI is
aimed at significantly higher performance
than the ARM9TDMI
achieved through use of:
higher clock rate
64-bit I- and D-memory buses
branch prediction
hit-under-miss D-memory interface

addr.
calc.

branch
prediction
instruction
fetch

decode

Fetch

Issue

data memory
access

r. read
decode

shift/ALU
multiply

multiplier
par tials add

Decode

Execute

Memory

data
write
reg
write

Write

Additional time allowed for


I- and D-memory accesses
instruction decode

6-stage pipeline
2001 PEVEIT Unit - ARM System Design

Cores - v4 - 23

2001 PEVEIT Unit - ARM System Design

Cores - v4 - 24

You might also like