You are on page 1of 50

Major CPU Design Steps

Datapath

1. Analyze instruction set operations using independent RTN


ISA => RTN => datapath requirements.
This provides the the required datapath components and how they are
connected to meet ISA requirements.

2. Select required datapath components, connections &


establish clock methodology (e.g clock edge-triggered).
+

Determine number of cycles per instruction and operations in each cycle.

Control

3. Assemble datapath meeting the requirements.


4. Identify and define the function of all control points or
signals needed by the datapath.
Analyze implementation of each instruction to determine setting of control
points that affects its operations and register transfer.
For each cycle
of the instruction

5. Design & assemble the control logic.


Hard-Wired: Finite-state machine implementation.
Microprogrammed.
i.e using a control program

3rd Edition Chapter 5.5 See Handout Not in 4th Edition

EECC550 - Shaaban
#1 Lec # 5 Winter 2012 12-18-2012

Single Cycle MIPS Datapath:

PCSrc

Branch
Zero

PC+4

ALUop
(2-bits)

Zero

Function
Field

32

Branch
Target

imm16

16

32
Data In

32
Clk

32

0
Mux

Clk

Extender

Clk

MemWr MemtoReg

Main
ALU

ALU

busW

Mux

PC

Mux
Adder

Rs Rt
5
5
R[rs]
busA
Rw Ra Rb
32
32 32-bit
R[rt]
Registers
busB
0
32

ALU
Control

RegWr 5
0

T = I x CPI x C

Imm16

Rd Rt
0
1

Adder
PC Ext

imm16

Rd

RegDst

00

Rt

Instruction<31:0>

<0:15>

Rs

<11:15>

Adr

<16:20>

<21:25>

Inst
Memory

CPI = 1, Long Clock Cycle

WrEn Adr

Data
Memory

Jump Not Included


(Includes ORI
not in book version)

ExtOp ALUSrc

EECC550 - Shaaban
#2 Lec # 5 Winter 2012 12-18-2012

Single Cycle MIPS Datapath Extended To Handle Jump with


Control Unit Added
32
Instruction [250]

32

Jump address [310]

Shift
left 2

26

28
PC + 4 [3128]

Add

PC +4
32

PC +4

32

M
u
x

PC +4
Add

ALU
result

Branch
Target

32
M
u
x

Shift
left 2

RegDst
Jump
Branch

Opcode

MemRead

Instruction [3126]

MemtoReg

Control

ALUOp
MemWrite
ALUSrc
RegWrite

PC

Instruction [2521]

rs

Instruction [2016]

rt

Read
address

Instruction
[310]
Instruction
memory

Read
register 1

Read
data 1

Read
register 2

Zero

0
Instruction [1511]

rd

Instruction [150]

imm16

4th Edition Figure 4.24 page 329


3rd Edition Figure 5.24 page 314

M
u
x

Read
data 2

Write
register
Write
data

16

R[rs]

ALU

R[rt]
0

M
u
x

ALU
result

Data
memory

1
Registers

Sign
extend

Address

R[rt]

Write
data

Read
data

M
u
x

32

32
ALU
control

Function Field

Instruction [50]

In this book version, ORI is not supportedno zero extend of immediate needed.

ALUOp (2-bits)
00 = add
01 = subtract
10 = R-Type

EECC550 - Shaaban
#3 Lec # 5 Winter 2012 12-18-2012

Drawbacks of Single-Cycle Processor


1. Long cycle time:

All instructions must take as much time as the slowest:

CPI = 1

Cycle time for load is longer than needed for all other instructions.

Real memory is not as well-behaved as idealized memory

Cannot always complete data access in one (short) cycle.

2. Impossible to implement complex, variable-length instructions and


complex addressing modes in a single cycle.

e.g indirect memory addressing.

3. High and duplicate hardware resource requirements


Any hardware functional unit cannot be used more than once in
a single cycle (e.g. ALUs).
4. Cannot pipeline (overlap) the processing of one instruction with the
previous instructions.
(instruction pipelining, 4th edition chapter 4 3rd edition ch. 6).

EECC550 - Shaaban
#4 Lec # 5 Winter 2012 12-18-2012

Abstract View of Single Cycle CPU


Main
Control

op

Critical Path = C = 8ns (LW)


ALU
control

2 ns

RegDst
RegWr

MemWr

Result Store

2 ns

Reg.
Wrt

MemRd
MemWr

Mem
Access

ExtOp
ALUSrc
ALUctr

ALU

1 ns

Data
Mem

1 ns

Ext

Register
Fetch

Instruction
Fetch

PC

Next PC

Equal

Branch, Jump

fun

2 ns

One CPU Clock Cycle


Duration C = 8ns

One instruction per cycle CPI = 1


Assuming the following datapath/control hardware components delays:
Memory Units: 2 ns ALU and adders: 2 ns
Register File: 1 ns
Control Unit < 1 ns

EECC550 - Shaaban
#5 Lec # 5 Winter 2012 12-18-2012

Single Cycle Instruction Timing


Arithmetic & Logical
PC
Inst Memory
Load
PC

2 ns
Inst Memory

Reg File

mux

1 ns
mux
Reg File
Critical Path

Store
PC

Inst Memory

Reg File

Branch
PC

Inst Memory

Reg File

ALU

mux

setup

2 ns

2 ns

ALU

Data Mem

1 ns
mux setup

(Determines CPU clock cycle, C)


mux

cmp

ALU

Data Mem

mux

Critical Path: Load - LW (e.g C = 8 ns)

EECC550 - Shaaban
#6 Lec # 5 Winter 2012 12-18-2012

Clock Cycle Time & Critical Path


One CPU Clock Cycle
Duration C = 8ns here

Clk

.
.
.

.
.
.

.
.
.

i.e longest delay

.
.
.

Critical Path
LW in this case

Critical path: the slowest path between any two storage devices
Clock Cycle time is a function of the critical path, and must be
greater than:
Clock-to-Q + Longest Delay Path through the Combination Logic
+ Setup + Clock Skew
Assuming the following datapath/control hardware components delays:
Memory Units: 2 ns ALU and adders: 2 ns
Register File: 1 ns
Control Unit < 1 ns

EECC550 - Shaaban
#7 Lec # 5 Winter 2012 12-18-2012

Reducing Cycle Time: Multi-Cycle Design

Cut combinational dependency graph by inserting registers / latches.


The same work is done in two or more shorter cycles, rather than one
long cycle.
storage element

storage element
Two shorter
cycles
One long
cycle
e.g CPI =1

Acyclic
Combinational
Logic

Cycle 1

Acyclic
Combinational
Logic (A)
e.g CPI =2

=>
Storage Element:
Register or memory

Cycle 2

storage element
Place registers to:
Get a balanced clock cycle length
Save any results needed for the remaining cycles

storage element
Acyclic
Combinational
Logic (B)
storage element

EECC550 - Shaaban
#8 Lec # 5 Winter 2012 12-18-2012

Basic MIPS Instruction Processing Steps


Instruction Memory

Instruction
Fetch
Next

Obtain instruction from program storage

Instruction Mem[PC]
Update program counter to address

Instruction

of next instruction

Instruction

Determine instruction type

PC

PC + 4

Decode

Obtain operands from registers

Execute

Compute result value or status

Done by
Control Unit

Result

Store result in register/memory if needed

Store

(usually called Write Back).

T = I x CPI x C

Common
steps
for all
instructions

EECC550 - Shaaban
#9 Lec # 5 Winter 2012 12-18-2012

Partitioning The Single Cycle Datapath


Add registers between steps to break into cycles

Instruction
Fetch
Cycle
(IF)

Instruction
Decode
2 Cycle
(ID)

Execution
Cycle
3 (EX)

Place registers to:


Get a balanced clock cycle length
Save any results needed for the remaining cycles

Data
Memory
Access
4 Cycle
(MEM)

Result Store

MemWr

RegDst
RegWr

Reg.
File

MemRd
MemWr

ALUctr

ALUSrc

Exec

Data
Mem

Operand
Fetch

Instruction
Fetch

2 ns

ExtOp

1 ns

C = 2 ns
f = 500 MHz
1 ns

2 ns

2 ns

Mem
Access

To Control Unit

PC

Next PC

Branch, Jump

Thus:

Write back
Cycle
(WB)

EECC550 - Shaaban
#10 Lec # 5 Winter 2012 12-18-2012

Instruction
Decode
(ID)
2 1ns

MemToReg

MemRd
MemWr

ALUSrc
ALUctr

Execution
(EX)
2ns

RegDst
Reg.
RegWr
File
Equal

Write to
Register

Data
Mem

Reg
File

Mem
Access

Instruction
Fetch
(IF)
2ns

IR

Instruction
Fetch

Read
Registers

Ext
ALU

ExtOp

To Control Unit

PC

Branch, Jump

Next PC
1

Example Multi-cycle Datapath

Memory
Write Back
(MEM)
(WB)
3
4 2ns
5
1ns
All clock-edge triggered (not shown register write enable control lines)

Registers added:
IR:
Instruction register
A, B: Two registers to hold operands read from register file. i.e R[rs], R[rt]
R:
or ALUOut, holds the output of the main ALU ALU result
M:
or Memory data register (MDR) to hold data read from data memory
CPU Clock Cycle Time: Worst cycle delay = C = 2ns
Assuming the following datapath/control hardware components delays:
Memory Units: 2 ns ALU and adders: 2 ns
Register File: 1 ns
Control Unit < 1 ns

Thus Clock Rate:


f = 1 / 2ns = 500 MHz

(ignoring MUX, CLK-Q delays)

EECC550 - Shaaban
#11 Lec # 5 Winter 2012 12-18-2012

Operations (Dependant RTN) for Each Cycle


Logic
Immediate

R-Type
IF

Instruction
Fetch

IR Mem[PC]

IR

ID

Instruction
Decode

A R[rs]

A R[rs]

R[rt]

Mem[PC]

R[rt

Load
IR

Mem[PC]

A R[rs]
B R[rt

Store
IR

Branch
IR

Mem[PC]

A R[rs]

R[rs]

Mem[PC]

R[rt]

R[rt]

Zero A - B
If Zero = 1:
EX

Execution

R A funct B

R A OR ZeroExt[imm16]

R A + SignEx(Im16)

R A + SignEx(Im16)

PC PC + 4 +
(SignExt(imm16) x4)
else (i.e Zero =0):
PC PC + 4

MEM

WB

Memory

Write
Back

M Mem[R]

R[rd] R

R[rt] R

R[rt]

PC PC + 4

PC PC + 4

PC PC + 4

Instruction Fetch (IF) & Instruction Decode cycles


are common for all instructions

Mem[R]

PC

PC + 4

EECC550 - Shaaban
#12 Lec # 5 Winter 2012 12-18-2012

MIPS Multi-Cycle Datapath:

Five Cycles of Load


Cycle 1 Cycle 2

Load

IF

ID

CPI = 5

Cycle 3 Cycle 4 Cycle 5

EX

MEM

WB

1- Instruction Fetch (IF):


Fetch the instruction from instruction Memory.
2- Instruction Decode (ID):
Operand Register Fetch and Instruction Decode.
3- Execute (EX): Calculate the effective memory address.
4- Memory (MEM): Read the data from the Data Memory.
5- Write Back (WB):
Write the loaded data to the register file. Update PC.
EECC550 - Shaaban
#13 Lec # 5 Winter 2012 12-18-2012

Multi-cycle Datapath Instruction CPI


R-Type/Immediate: Require four cycles, CPI = 4

IF, ID, EX, WB

Loads: Require five cycles, CPI = 5

IF, ID, EX, MEM, WB

Stores: Require four cycles, CPI = 4


IF, ID, EX, MEM

Branches/Jumps: Require three cycles, CPI = 3


IF, ID, EX

Average or effective program CPI:


3 CPI 5
depending on program profile (instruction mix).

C = 2 ns f = 500 MHz

EECC550 - Shaaban
#14 Lec # 5 Winter 2012 12-18-2012

Single Cycle Vs. Multi-Cycle CPU


Clk

8ns (125 MHz)

Cycle 1

Cycle 2

Single Cycle Implementation:

8 ns

Load

Store

Waste

2ns (500 MHz)


Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Multiple Cycle Implementation:
Load
IF
ID
EX
MEM

WB

Store
IF

ID

EX

R-type
MEM IF
1 CPI 5

Single-Cycle CPU:
CPI = 1 C = 8ns f = 125 MHz
One million instructions take =
I x CPI x C = 106 x 1 x 8x10-9 = 8 msec
T = I x CPI x C
Assuming the following datapath/control hardware components delays:
Memory Units: 2 ns ALU and adders: 2 ns
Register File: 1 ns
Control Unit < 1 ns

Multi-Cycle CPU:
CPI = 3 to 5 C = 2ns f = 500 MHz
One million instructions take from
106 x 3 x 2x10-9 = 6 msec
to 106 x 5 x 2x10-9 = 10 msec
depending on instruction mix used.

EECC550 - Shaaban
#15 Lec # 5 Winter 2012 12-18-2012

Control Unit Design:

Finite State Machine (FSM) Control Model

State specifies control points (outputs) for Register Transfer. AKA Hardwired Control
Control points (outputs) are assumed to depend only on the current state
and not inputs (i.e. Moore finite state machine)
Transfer (register/memory writes) and state transition occur upon exiting
the state on the falling edge of the clock.
inputs (opcode, conditions)

Last State

Next State
Logic

State X

Control State

Register Transfer
Control Points

State Transition Depends


on Inputs

e.g Flip-Flops

Current
state

Current State

Output Logic
Next State
outputs (control points)

Moore Finite
State Machine

To datapath

Vs. Mealy ?

EECC550 - Shaaban
#16 Lec # 5 Winter 2012 12-18-2012

Control Specification For Multi-cycle CPU


Finite State Machine (FSM) - State Transition Diagram
instruction fetch

IR MEM[PC]

(Start state)
A R[rs]
B R[rt]

R A or ZX

R[rd] R
PC PC + 4

R[rt] R
PC PC + 4

To instruction fetch

LW

SW

BEQ & Zero


BEQ & ~Zero
PC PC + 4

R A + SX

R A + SX

M MEM[R]

MEM[R] B
PC PC + 4

R[rt] M
PC PC + 4
To instruction fetch

PC PC +
4+ SX || 00

To instruction fetch

13 states:
4 State Flip-Flops needed

Write-back

R A fun B

ORi

Memory

Execute

R-type

decode / operand fetch

EECC550 - Shaaban
#17 Lec # 5 Winter 2012 12-18-2012

Traditional FSM Controller


next
state op cond state

Outputs (to datapath)

control points

Next State
Logic

Output
Logic

State Transition Table


Inputs

11

next
State

control points

Equal
6
Opcode
Current
State

State

op

Outputs (Control points)

To datapath

datapath State
State register (4 Flip-Flops)

EECC550 - Shaaban
#18 Lec # 5 Winter 2012 12-18-2012

Traditional FSM Controller


datapath + state diagram => control
Translate RTN statements into
control points.
Assign states.
Implement the controller.
More on FSM controller implementation in Appendix C

EECC550 - Shaaban
#19 Lec # 5 Winter 2012 12-18-2012

Mapping RTNs To Control Points Examples


& State Assignments
IR MEM[PC]

instruction fetch

0000

imem_rd, IRen

A R[rs]
B R[rt]

Aen, Ben

decode / operand fetch

0001
ALUfun, Sen

R-type

R A fun B
0100

BEQ & Zero

SW

BEQ & ~Zero

11

R A or ZX

R A + SX

0110

1000

R A + SX

M MEM[R]
1001

1011

R[rd] R
PC PC + 4

R[rt] R
PC PC + 4

0101

0111

To instruction fetch state 0000

0011

MEM[R] B
PC PC + 4

PC PC +
4+SX || 00
0010

To instruction fetch
state 0000

10
R[rt] M
PC PC + 4
1010
To instruction fetch state 0000

1100

PC PC + 4

12

RegDst,
RegWr,
PCen

LW

ORi

13 states:
4 State Flip-Flops needed

Write-back

Memory

Execute

EECC550 - Shaaban
#20 Lec # 5 Winter 2012 12-18-2012

Detailed Control Specification (Partial) State Transition Table


Current

Op field Z

Next IR

??????
BEQ
BEQ
R-type
orI
LW
SW
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx

0001 1
0011
0010
0100
0110
1000
1011
0000
1
0000
1
0101
0000
1
0111
0000
1
1001
1010
0000
1
1100
0000
1

State

IF

ID

BEQ

ORI

LW

SW

0000
0001
0001
0001
0001
0001
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100

?
0
1
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x

PC
en sel

Ops
AB

Exec
Ex Sr ALU S

Mem
RWM

Write-Back
M-R Wr Dst

11
11
11
11
11
11
1
0

Can be combined in one state


0 1 fun

0
0 0 or

0
1 0 add 1
1 0 1
0
1 0 add 1
0

More on FSM controller implementation in Appendix C

0 1

EECC550 - Shaaban
#21 Lec # 5 Winter 2012 12-18-2012

Alternative Multiple Cycle Datapath (In Textbook)


Minimizes Hardware: 1 memory, 1 ALU
PCWrCond
Zero
IorD
MemWr
IRWr

PCWr

ALUSrcA 1

RegWr

Mux

RegDst

32

PC

32

Din Dout
32

MemRd

Ra
busA A

Rb

Rd

busW busB

1
1 Mux 0

Imm 16

Extend

32

32

1
32

0
1
2
3

32
32

ALU
Control

<< 2
ALUOp

MemtoReg

3rd Edition Chapter 5.5


(see handout) Not in 4th Edition

Zero

32

Reg File
Rw

ALU Out

32

Rt 0

Mux

32 Rt

Mem Data Reg

Ideal
Memory

Rs

32

ALU

Address

PC

Mux

Mux

32

Instruction Reg

32

32

PCSrc

ALUSrcB

EECC550 - Shaaban
#22 Lec # 5 Winter 2012 12-18-2012

Alternative Multiple Cycle Datapath (In Textbook)


IorD

PC

0
M
u
x
1

MemRead

MemWrite

Instruction
[2521]

Address
Memory
MemData
Write
data

IRWrite

Instruction
[2016]
Instruction
[150]
Instruction
register
Instruction
[150]
Memory
data
register

RegDst

ALUSrcA

RegWrite

rs

0
M
u
x
1

Read
register 1
Read
Read
data 1
register 2
Registers
Write
Read
register
data 2

rt
0
M
Instruction u
x
[1511]
1

16

Sign
extend

32

Shift
left 2

Zero
ALU

ALU
result

ALUOut

0
4

Write
data

rd

0
M
u
x
1

1 M
u
2 x
3

ALU
control

imm16

i.e MDR

Instruction [50]

MemtoReg

ALUSrcB ALUOp

Shared instruction/data memory unit


A single ALU shared among instructions
Shared units require additional or widened multiplexors
Temporary registers to hold data between clock cycles of the instruction:
Additional registers:
Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut
(Figure 5.27 page 322)

EECC550 - Shaaban
#23 Lec # 5 Winter 2012 12-18-2012

Alternative Multiple Cycle Datapath With Control Lines


(Fig 5.28 In Textbook)
32

2
PC+ 4
PC

32

32
32

rs

Branch
Target

rt

rd

32
32

2
imm16

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#24 Lec # 5 Winter 2012 12-18-2012

The Effect of The 1-bit Control Signals


Signal
Name

Effect when deasserted (=0)

Effect when asserted (=1)

RegDst

The register destination number for the


write register comes from the rt field
(instruction bits 20:16).

RegWrite

None

The register destination number for the


write register comes from the rd field
(instruction bits 15:11).
The register on the write register input
is written with the value on the Write
data input.

ALUSrcA

The first ALU operand is the PC

The First ALU operand is register A (i.e R[rs])

MemRead

None

MemWrite

None

Content of memory specified by the address input


are put on the memory data output.
Memory contents specified by the address input is
replaced by the value on the Write data input.

MemtoReg

The value fed to the register write data


input comes from ALUOut register.

The value fed to the register write data


input comes from data memory register (MDR).

IorD

The PC is used to supply the address to the


memory unit.

The ALUOut register is used to supply the the


address to the memory unit.

IRWrite

None

The output of the memory is written into


Instruction Register (IR)

PCWrite

None

The PC is written; the source is controlled by


PCSource

PCWriteCond None
i.e. Branch

(Figure 5.29 page 324)

The PC is written if the Zero output of the ALU is


also active.

EECC550 - Shaaban
#25 Lec # 5 Winter 2012 12-18-2012

The Effect of The 2-bit Control Signals


Signal
Name

Effect

Value (Binary)
00

The ALU performs an add operation

01

The ALU performs a subtract operation

10

The funct field of the instruction determines the ALU


operation (R-Type)

00

The second input of the ALU comes from register B (i.e R[rs])

01

The second input of the ALU is the constant 4

ALUOp

ALUSrcB

10
11
00

The second input of the ALU is the sign-extended 16-bit


immediate (imm16) field of the instruction in IR
The second input of the ALU is is the sign-extended 16-bit
immediate field of IR shifted left 2 bits (for branches)
Output of the ALU (PC+4) is sent to the PC for writing

01

The content of ALUOut (the branch target address) is sent


to the PC for writing

10

The jump target address (IR[25:0] shifted left 2 bits and


concatenated with PC+4[31:28] is sent to the PC for writing

PCSource

i.e jump address

(Figure 5.29 page 324)

EECC550 - Shaaban
#26 Lec # 5 Winter 2012 12-18-2012

Operations (Dependant RTN) for Each Cycle


R-Type
IF

ID

EX

Instruction
Fetch
Instruction
Decode

Execution

IR Mem[PC]
PC PC + 4

WB

Store

IR Mem[PC]
PC PC + 4

IR Mem[PC]
PC PC + 4

Branch
IR Mem[PC]
PC PC + 4

Jump
IR Mem[PC]
PC PC + 4

A R[rs]

A R[rs]

R[rs]

R[rs]

R[rs]

R[rt]

R[rt]

R[rt]

R[rt]

R[rt]

ALUout PC +
(SignExt(imm16)
x4)

ALUout PC +

ALUout

ALUout

A funct B

MEM

Load

(SignExt(imm16) x4)

ALUout PC +

ALUout PC +

(SignExt(imm16) x4)

Zero A - B

ALUout

A + SignEx(Imm16)

(SignExt(imm16) x4)

A + SignEx(Imm16)

ALUout PC +
(SignExt(imm16) x4)

PC Jump Address

Zero: PC ALUout

Memory
MDR Mem[ALUout]

Write
Back

R[rd] ALUout

R[rt]

Mem[ALUout]

MDR

Instruction Fetch (IF) & Instruction Decode (ID) cycles


are common for all instructions

EECC550 - Shaaban
#27 Lec # 5 Winter 2012 12-18-2012

High-Level View of Finite State


Machine Control
(Figure 5.32)
2-5

6-7
(Figure 5.33)

(Figure 5.34)

0-1

9
(Figure 5.35)

(Figure 5.36)

First steps are independent of the instruction class


Then a series of sequences that depend on the instruction opcode
Then the control returns to fetch a new instruction.
Each box above represents one or several state.

(Figure 5.31 page 332)

EECC550 - Shaaban
#28 Lec # 5 Winter 2012 12-18-2012

FSM State Transition


Diagram (From Book)

IF

A R[rs]

ID

R[rt]

ALUout PC +

(Figure 5.38 page 339)

(SignExt(imm16) x4)
IR Mem[PC]
PC PC + 4

ALUout
A + SignEx(Imm16)
PC Jump Address

EX
ALUout A func B

Zero A -B
Zero: PC ALUout

MDR Mem[ALUout]

WB

MEM

R[rd] ALUout

Mem[ALUout] B

Total 10 states
R[rt]

MDR

WB

EECC550 - Shaaban

More on FSM controller implementation in Appendix C

#29 Lec # 5 Winter 2012 12-18-2012

Instruction Fetch (IF) and Decode (ID)


FSM States
A

R[rs]

R[rt]

ALUout PC + (SignExt(imm16) x4)

IF
IR Mem[PC]
PC PC + 4

(Figure 5.33)

(Figure 5.32 page 333)

(Figure 5.34)

ID

(Figure 5.35)

(Figure 5.36)

EECC550 - Shaaban
#30 Lec # 5 Winter 2012 12-18-2012

Instruction Fetch (IF) Cycle (State 0)


IR Mem[PC]
PC PC + 4

MemRead = 1
ALUSrcA = 0
ALUSrcB = 01 ALUOp = 00 (add)

IorD = 0
PCWrite = 1

IRWrite =1
PCSource = 00
32

00

2
1

01

PC+ 4

PC

32

32
32

rs

Branch
Target

rt

rd

32
32

2
imm16

00
Add

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#31 Lec # 5 Winter 2012 12-18-2012

Instruction Decode (ID) Cycle (State 1)


A

R[rs]

R[rt]

ALUSrcA = 0

ALUout PC + (SignExt(imm16) x4)

ALUSrcB = 11

ALUOp = 00 (add)

(Calculate branch target)

32

2
11

PC

32

PC+ 4

32
32

rs

Branch
Target

rt

rd

32
32

2
imm16

00
Add

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#32 Lec # 5 Winter 2012 12-18-2012

Load/Store Instructions FSM States


(From Instruction Decode)

ALUout A + SignEx(Imm16)

EX

i.e Effective address calculation

MDR Mem[ALUout]

MEM

R[rt]

Mem[ALUout] B

MDR

WB

(Figure 5.33 page 334)

To Instruction Fetch
(Figure 5.32)

EECC550 - Shaaban
#33 Lec # 5 Winter 2012 12-18-2012

Load/Store Execution (EX) Cycle (State 2)


Effective address calculation

ALUout A + SignEx(Imm16)

ALUSrcA = 1
ALUOp = 00 (add)

ALUSrcB = 10
32

2
10

PC

32

PC+ 4

32
32

rs

Branch
Target

rt

rd

32
32

2
imm16

00
Add

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#34 Lec # 5 Winter 2012 12-18-2012

Load Memory (MEM) Cycle (State 3)


MDR Mem[ALUout]

MemRead = 1

IorD = 1
32

2
1

PC+ 4
PC

32

32
32

rs

Branch
Target

rt

rd

32
32

2
imm16

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#35 Lec # 5 Winter 2012 12-18-2012

Load Write Back (WB) Cycle (State 4)


R[rt]

MDR

RegWrite = 1

MemtoReg = 1

RegDst = 0
32

2
PC+ 4

PC

32

32
0

32

rs

Branch
Target

rt

rd

32
32

2
1

imm16

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#36 Lec # 5 Winter 2012 12-18-2012

Store Memory (MEM) Cycle (State 5)


Mem[ALUout] B

MemWrite = 1

IorD = 1
32

2
1

PC+ 4
PC

32

32
32

rs

Branch
Target

rt

rd

32
32

2
imm16

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#37 Lec # 5 Winter 2012 12-18-2012

(From Instruction Decode)

R-Type Instructions
FSM States
EX

ALUout A funct B

WB

R[rd] ALUout

To State 0 (Instruction Fetch)


(Figure 5.32)
(Figure 5.34 page 335)

EECC550 - Shaaban
#38 Lec # 5 Winter 2012 12-18-2012

R-Type Execution (EX) Cycle (State 6)


ALUout A funct B

ALUSrcA = 1

ALUSrcB = 00

ALUOp = 10 (R-Type)
32

2
00

PC

32

PC+ 4

32
32

rs

Branch
Target

rt

rd

32
32

2
imm16

10
R-Type

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#39 Lec # 5 Winter 2012 12-18-2012

R-Type Write Back (WB) Cycle (State 7)


R[rd] ALUout

RegWrite = 1

MemtoReg = 0

RegDst = 1
32

2
PC+ 4

PC

32

32
1

32

rs

Branch
Target

rt

rd

32
32

2
0

imm16

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#40 Lec # 5 Winter 2012 12-18-2012

Jump Instruction
Single EX State

Branch Instruction
Single EX State

(From Instruction Decode)

(From Instruction Decode)

Zero A - B

PC Jump Address

Zero : PC ALUout

EX

EX

To State 0 (Instruction Fetch)


(Figure 5.32)

(Figures 5.35, 5.36 page 337)

To State 0 (Instruction Fetch)


(Figure 5.32)

EECC550 - Shaaban
#41 Lec # 5 Winter 2012 12-18-2012

Branch Execution (EX) Cycle (State 8)


Zero A - B
Zero : PC ALUout

ALUSrcA = 1
PCWriteCond = 1

ALUSrcB = 00
PCSource = 01

ALUOp = 01 (Subtract)
32

01

2
00

PC

32

PC+ 4

32
32

rs

Branch
Target

rt

rd

32
32

2
imm16

01
Subtract

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#42 Lec # 5 Winter 2012 12-18-2012

Jump Execution (EX) Cycle (State 9)


PC Jump Address

PCWrite = 1

PCSource = 10
32

10

PC+ 4
PC

32

32
32

rs

Branch
Target

rt

rd

32
32

2
imm16

32

(ORI not supported, Jump supported)


(Figure 5.28 page 323)

EECC550 - Shaaban
#43 Lec # 5 Winter 2012 12-18-2012

MIPS Multi-cycle Datapath


Performance Evaluation

1 CPI 5

What is the average CPI?


State diagram gives CPI for each instruction type.
Workload (program) below gives frequency of each type.
Type

CPIi for type

Frequency

CPIi x freqIi

Arith/Logic

40%

1.6

Load

30%

1.5

Store

10%

0.4

branch

20%

0.6

Average CPI:

4.1

Better than CPI = 5 if all instructions took the same number


of clock cycles (5).
C = 2 ns f = 500 MHz

T = I x CPI x C

EECC550 - Shaaban
#44 Lec # 5 Winter 2012 12-18-2012

Adding Support for swap to Multi Cycle Datapath


You are to add support for a new instruction, swap that
exchanges the values of two registers to the MIPS multicycle
datapath of Figure 5.28 on page 232
i.e. R[rt] R[rs]
swap $rs, $rt
R[rs] R[rt]
Swap used the R-Type format with:
the value of field rs = the value of field rd
Add any necessary datapaths and control signals to the
multicycle datapath. Find a solution that minimizes the
number of clock cycles required for the new instruction without
modifying the register file. Justify the need for the
modifications, if any.
i.e No additional register write ports
Show the necessary modifications to the multicycle control
finite state machine of Figure 5.38 on page 339 when adding
the swap instruction. For each new state added, provide the
dependent RTN and active control signal values.
EECC550 - Shaaban
#45 Lec # 5 Winter 2012 12-18-2012

Adding swap Instruction Support to Multi Cycle Datapath


Swap $rs, $rt

R[rt] R[rs]

We assume here rs = rd in instruction encoding

op

R[rs] R[rt]

rs rt

[31-26] [25-21]

[20-16]

rd
[10-6]

2
PC+ 4

rs
R[rs]

rt

Branch
Target

R[rt]

rd

2
3
imm16

The outputs of A and B should be connected to the multiplexor controlled by MemtoReg if one of the two fields
(rs and rd) contains the name of one of the registers being swapped. The other register is specified by rt.
The MemtoReg control signal becomes two bits.

EECC550 - Shaaban
#46 Lec # 5 Winter 2012 12-18-2012

Adding swap Instruction Support to Multi Cycle Datapath


IF
A R[rs]
IR Mem[PC]
PC PC + 4

ID

R[rt]

ALUout PC +
(SignExt(imm16) x4)

EX

ALUout
A + SignEx(Imm16)

WB1
R[rd] B
rd = rs
ALUout A func B
Zero A -B
Zero: PC ALUout

WB2
R[rt] A

R[rd] ALUout

A has R[rs]

MEM

WB
Swap takes 4 cycles

WB

EECC550 - Shaaban
#47 Lec # 5 Winter 2012 12-18-2012

Adding Support for add3 to Multi Cycle Datapath

You are to add support for a new instruction, add3, that adds the values of
three registers, to the MIPS multicycle datapath of Figure 5.28 on page 232
For example:
add3 $s0,$s1, $s2, $s3
Register $s0 gets the sum of $s1, $s2 and $s3.
The instruction encoding uses a modified R-format, with an additional register
specifier rx added replacing the five low bits of the funct field.
6 bits
[31-26]

5 bits
[25-21]

5 bits
[20-16]

5 bits
[15-11]

OP

rs

rt

rd

add3

$s1

$s2

$s0

6 bits
[10-5]

5 bits
[4-0]

rx

Not used
$s3
Add necessary datapath components, connections, and control signals to the multicycle
datapath without modifying the register bank or adding additional ALUs. Find a solution
that minimizes the number of clock cycles required for the new instruction. Justify the
need for the modifications, if any.
Show the necessary modifications to the multicycle control finite state machine of Figure
5.38 on page 339 when adding the add3 instruction. For each new state added, provide
the dependent RTN and active control signal values.

EECC550 - Shaaban
#48 Lec # 5 Winter 2012 12-18-2012

add3 instruction support to Multi Cycle Datapath


Add3 $rd, $rs, $rt, $rx

rx is a new register specifier in field [0-4] of the instruction


No additional register read ports or ALUs allowed

R[rd] R[rs] + R[rt] + R[rx]

Modified
R-Format

op

rs rt

[31-26] [25-21]

[20-16]

rd

rx

[10-6]

[4-0]

2
WriteB
Re adSrc

rs
rt

PC+ 4

Branch
Target

rx
rd

imm16

1. ALUout is added as an extra input to first ALU operand MUX to use the previous ALU result as an input for the second addition.
2. A multiplexor should be added to select between rt and the new field rx containing register number of the 3rd operand
(bits 4-0 for the instruction) for input for Read Register 2.
This multiplexor will be controlled by a new one bit control signal called ReadSrc.
3. WriteB control line added to enable writing R[rx] to B

EECC550 - Shaaban
#49 Lec # 5 Winter 2012 12-18-2012

add3 instruction support to Multi Cycle Datapath


IF
A R[rs]
IR Mem[PC]
PC PC + 4

ID

R[rt]

ALUout PC +
(SignExt(imm16) x4)

EX

ALUout

WriteB

A + SignEx(Im16)

EX1
ALUout A + B
WriteB

B R[rx]

ALUout A func B
Zero A -B
Zero: PC ALUout

EX2
ALUout ALUout + B

R[rd] ALUout

MEM

WB
Add3 takes 5 cycles

WB

EECC550 - Shaaban
#50 Lec # 5 Winter 2012 12-18-2012

You might also like