Mips Isa Eecc550

Major CPU Design Steps
Datapath
1. Analyze instruction set operations using independent RTN

ISA => RTN => datapath requirements.
This provides the the required datapath components and how they are
connected to meet ISA requirements.
2. Select required datapath components, connections &

establish clock methodology (e.g clock edge-triggered).
+
Determine number of cycles per instruction and operations in each cycle.
Control
3. Assemble datapath meeting the requirements.

4. Identify and define the function of all control points or
signals needed by the datapath.
Analyze implementation of each instruction to determine setting of control
points that affects its operations and register transfer.
For each cycle
of the instruction
5. Design & assemble the control logic.

Hard-Wired: Finite-state machine implementation.
Microprogrammed.
i.e using a control program
3rd Edition Chapter 5.5 See Handout Not in 4th Edition
EECC550 - Shaaban
#1 Lec # 5 Winter 2012 12-18-2012
Single Cycle MIPS Datapath:
PCSrc
Branch
Zero
PC+4
ALUop
(2-bits)
Zero
Function
Field
32
Branch
Target
imm16
16
32
Data In
32
Clk
32
0
Mux
Clk
Extender
Clk
MemWr MemtoReg
Main
ALU
ALU
busW
Mux
PC
Mux
Adder
Rs Rt
5
5
R[rs]
busA
Rw Ra Rb
32
32 32-bit
R[rt]
Registers
busB
0
32
ALU
Control
RegWr 5
0
T = I x CPI x C
Imm16
Rd Rt
0
1
Adder
PC Ext
imm16
Rd
RegDst
00
Rt
Instruction<31:0>
<0:15>
Rs
<11:15>
Adr
<16:20>
<21:25>
Inst
Memory
CPI = 1, Long Clock Cycle
WrEn Adr
Data
Memory
Jump Not Included

(Includes ORI
not in book version)
ExtOp ALUSrc
EECC550 - Shaaban
#2 Lec # 5 Winter 2012 12-18-2012
Single Cycle MIPS Datapath Extended To Handle Jump with

Control Unit Added
32
Instruction [250]
32
Jump address [310]
Shift
left 2
26
28
PC + 4 [3128]
Add
PC +4
32
PC +4
32
M
u
x
PC +4
Add
ALU
result
Branch
Target
32
M
u
x
Shift
left 2
RegDst
Jump
Branch
Opcode
MemRead
Instruction [3126]
MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite
PC
Instruction [2521]
rs
Instruction [2016]
rt
Read
address
Instruction
[310]
Instruction
memory
Read
register 1
Read
data 1
Read
register 2
Zero
0
Instruction [1511]
rd
Instruction [150]
imm16
4th Edition Figure 4.24 page 329

3rd Edition Figure 5.24 page 314
M
u
x
Read
data 2
Write
register
Write
data
16
R[rs]
ALU
R[rt]
0
M
u
x
ALU
result
Data
memory
1
Registers
Sign
extend
Address
R[rt]
Write
data
Read
data
M
u
x
32
32
ALU
control
Function Field
Instruction [50]
In this book version, ORI is not supportedno zero extend of immediate needed.
ALUOp (2-bits)
00 = add
01 = subtract
10 = R-Type
EECC550 - Shaaban
#3 Lec # 5 Winter 2012 12-18-2012
Drawbacks of Single-Cycle Processor

1. Long cycle time:
All instructions must take as much time as the slowest:
CPI = 1
Cycle time for load is longer than needed for all other instructions.
Real memory is not as well-behaved as idealized memory
Cannot always complete data access in one (short) cycle.
2. Impossible to implement complex, variable-length instructions and

complex addressing modes in a single cycle.
e.g indirect memory addressing.
3. High and duplicate hardware resource requirements

Any hardware functional unit cannot be used more than once in
a single cycle (e.g. ALUs).
4. Cannot pipeline (overlap) the processing of one instruction with the
previous instructions.
(instruction pipelining, 4th edition chapter 4 3rd edition ch. 6).
EECC550 - Shaaban
#4 Lec # 5 Winter 2012 12-18-2012
Abstract View of Single Cycle CPU

Main
Control
op
Critical Path = C = 8ns (LW)

ALU
control
2 ns
RegDst
RegWr
MemWr
Result Store
2 ns
Reg.
Wrt
MemRd
MemWr
Mem
Access
ExtOp
ALUSrc
ALUctr
ALU
1 ns
Data
Mem
1 ns
Ext
Register
Fetch
Instruction
Fetch
PC
Next PC
Equal
Branch, Jump
fun
2 ns
One CPU Clock Cycle

Duration C = 8ns
One instruction per cycle CPI = 1

Assuming the following datapath/control hardware components delays:
Memory Units: 2 ns ALU and adders: 2 ns
Register File: 1 ns
Control Unit < 1 ns
EECC550 - Shaaban
#5 Lec # 5 Winter 2012 12-18-2012
Single Cycle Instruction Timing

Arithmetic & Logical
PC
Inst Memory
Load
PC
2 ns
Inst Memory
Reg File
mux
1 ns
mux
Reg File
Critical Path
Store
PC
Inst Memory
Reg File
Branch
PC
Inst Memory
Reg File
ALU
mux
setup
2 ns
2 ns
ALU
Data Mem
1 ns
mux setup
(Determines CPU clock cycle, C)

mux
cmp
ALU
Data Mem
mux
Critical Path: Load - LW (e.g C = 8 ns)
EECC550 - Shaaban
#6 Lec # 5 Winter 2012 12-18-2012
Clock Cycle Time & Critical Path

One CPU Clock Cycle
Duration C = 8ns here
Clk
.
.
.
.
.
.
.
.
.
i.e longest delay
.
.
.
Critical Path
LW in this case
Critical path: the slowest path between any two storage devices
Clock Cycle time is a function of the critical path, and must be
greater than:
Clock-to-Q + Longest Delay Path through the Combination Logic
+ Setup + Clock Skew
Register File: 1 ns
Control Unit < 1 ns
EECC550 - Shaaban
#7 Lec # 5 Winter 2012 12-18-2012
Reducing Cycle Time: Multi-Cycle Design
Cut combinational dependency graph by inserting registers / latches.

The same work is done in two or more shorter cycles, rather than one
long cycle.
storage element
storage element
Two shorter
cycles
One long
cycle
e.g CPI =1
Acyclic
Combinational
Logic
Cycle 1
Acyclic
Combinational
Logic (A)
e.g CPI =2
=>
Storage Element:
Register or memory
Cycle 2
storage element
Place registers to:
Get a balanced clock cycle length
Save any results needed for the remaining cycles
storage element
Acyclic
Combinational
Logic (B)
storage element
EECC550 - Shaaban
#8 Lec # 5 Winter 2012 12-18-2012
Basic MIPS Instruction Processing Steps

Instruction Memory
Instruction
Fetch
Next
Obtain instruction from program storage
Instruction Mem[PC]
Update program counter to address
Instruction
of next instruction
Instruction
Determine instruction type
PC
PC + 4
Decode
Obtain operands from registers
Execute
Compute result value or status
Done by
Control Unit
Result
Store result in register/memory if needed
Store
(usually called Write Back).
T = I x CPI x C
Common
steps
for all
instructions
EECC550 - Shaaban
#9 Lec # 5 Winter 2012 12-18-2012
Partitioning The Single Cycle Datapath

Add registers between steps to break into cycles
Instruction
Fetch
Cycle
(IF)
Instruction
Decode
2 Cycle
(ID)
Execution
Cycle
3 (EX)
Place registers to:

Get a balanced clock cycle length
Save any results needed for the remaining cycles
Data
Memory
Access
4 Cycle
(MEM)
Result Store
MemWr
RegDst
RegWr
Reg.
File
MemRd
MemWr
ALUctr
ALUSrc
Exec
Data
Mem
Operand
Fetch
Instruction
Fetch
2 ns
ExtOp
1 ns
C = 2 ns
f = 500 MHz
1 ns
2 ns
2 ns
Mem
Access
To Control Unit
PC
Next PC
Branch, Jump
Thus:
Write back
Cycle
(WB)
EECC550 - Shaaban
#10 Lec # 5 Winter 2012 12-18-2012
Instruction
Decode
(ID)
2 1ns
MemToReg
MemRd
MemWr
ALUSrc
ALUctr
Execution
(EX)
2ns
RegDst
Reg.
RegWr
File
Equal
Write to
Register
Data
Mem
Reg
File
Mem
Access
Instruction
Fetch
(IF)
2ns
IR
Instruction
Fetch
Read
Registers
Ext
ALU
ExtOp
To Control Unit
PC
Branch, Jump
Next PC
1
Example Multi-cycle Datapath
Memory
Write Back
(MEM)
(WB)
3
4 2ns
5
1ns
All clock-edge triggered (not shown register write enable control lines)
Registers added:
IR:
Instruction register
A, B: Two registers to hold operands read from register file. i.e R[rs], R[rt]
R:
or ALUOut, holds the output of the main ALU ALU result
M:
or Memory data register (MDR) to hold data read from data memory
CPU Clock Cycle Time: Worst cycle delay = C = 2ns
Register File: 1 ns
Control Unit < 1 ns
Thus Clock Rate:

f = 1 / 2ns = 500 MHz
(ignoring MUX, CLK-Q delays)
EECC550 - Shaaban
#11 Lec # 5 Winter 2012 12-18-2012
Operations (Dependant RTN) for Each Cycle

Logic
Immediate
R-Type
IF
Instruction
Fetch
IR Mem[PC]
IR
ID
Instruction
Decode
A R[rs]
A R[rs]
R[rt]
Mem[PC]
R[rt
Load
IR
Mem[PC]
A R[rs]
B R[rt
Store
IR
Branch
IR
Mem[PC]
A R[rs]
R[rs]
Mem[PC]
R[rt]
R[rt]
Zero A - B
If Zero = 1:
EX
Execution
R A funct B
R A OR ZeroExt[imm16]
R A + SignEx(Im16)
R A + SignEx(Im16)
PC PC + 4 +
(SignExt(imm16) x4)
else (i.e Zero =0):
PC PC + 4
MEM
WB
Memory
Write
Back
M Mem[R]
R[rd] R
R[rt] R
R[rt]
PC PC + 4
PC PC + 4
PC PC + 4
Instruction Fetch (IF) & Instruction Decode cycles

are common for all instructions
Mem[R]
PC
PC + 4
EECC550 - Shaaban
#12 Lec # 5 Winter 2012 12-18-2012
MIPS Multi-Cycle Datapath:
Five Cycles of Load

Cycle 1 Cycle 2
Load
IF
ID
CPI = 5
Cycle 3 Cycle 4 Cycle 5
EX
MEM
WB
1- Instruction Fetch (IF):

Fetch the instruction from instruction Memory.
2- Instruction Decode (ID):
Operand Register Fetch and Instruction Decode.
3- Execute (EX): Calculate the effective memory address.
4- Memory (MEM): Read the data from the Data Memory.
5- Write Back (WB):
Write the loaded data to the register file. Update PC.
EECC550 - Shaaban
#13 Lec # 5 Winter 2012 12-18-2012
Multi-cycle Datapath Instruction CPI

R-Type/Immediate: Require four cycles, CPI = 4
IF, ID, EX, WB
Loads: Require five cycles, CPI = 5
IF, ID, EX, MEM, WB
Stores: Require four cycles, CPI = 4

IF, ID, EX, MEM
Branches/Jumps: Require three cycles, CPI = 3

IF, ID, EX
Average or effective program CPI:

3 CPI 5
depending on program profile (instruction mix).
C = 2 ns f = 500 MHz
EECC550 - Shaaban
#14 Lec # 5 Winter 2012 12-18-2012
Single Cycle Vs. Multi-Cycle CPU

Clk
8ns (125 MHz)
Cycle 1
Cycle 2
Single Cycle Implementation:
8 ns
Load
Store
Waste
2ns (500 MHz)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Multiple Cycle Implementation:
Load
IF
ID
EX
MEM
WB
Store
IF
ID
EX
R-type
MEM IF
1 CPI 5
Single-Cycle CPU:
CPI = 1 C = 8ns f = 125 MHz
One million instructions take =
I x CPI x C = 106 x 1 x 8x10-9 = 8 msec
T = I x CPI x C
Register File: 1 ns
Control Unit < 1 ns
Multi-Cycle CPU:
CPI = 3 to 5 C = 2ns f = 500 MHz
One million instructions take from
106 x 3 x 2x10-9 = 6 msec
to 106 x 5 x 2x10-9 = 10 msec
depending on instruction mix used.
EECC550 - Shaaban
#15 Lec # 5 Winter 2012 12-18-2012
Control Unit Design:
Finite State Machine (FSM) Control Model
State specifies control points (outputs) for Register Transfer. AKA Hardwired Control
Control points (outputs) are assumed to depend only on the current state
and not inputs (i.e. Moore finite state machine)
Transfer (register/memory writes) and state transition occur upon exiting
the state on the falling edge of the clock.
inputs (opcode, conditions)
Last State
Next State
Logic
State X
Control State
Register Transfer
Control Points
State Transition Depends

on Inputs
e.g Flip-Flops
Current
state
Current State
Output Logic
Next State
outputs (control points)
Moore Finite
State Machine
To datapath
Vs. Mealy ?
EECC550 - Shaaban
#16 Lec # 5 Winter 2012 12-18-2012
Control Specification For Multi-cycle CPU

Finite State Machine (FSM) - State Transition Diagram
instruction fetch
IR MEM[PC]
(Start state)
A R[rs]
B R[rt]
R A or ZX
R[rd] R
PC PC + 4
R[rt] R
PC PC + 4
To instruction fetch
LW
SW
BEQ & Zero

BEQ & ~Zero
PC PC + 4
R A + SX
R A + SX
M MEM[R]
MEM[R] B
PC PC + 4
R[rt] M
PC PC + 4
PC PC +
4+ SX || 00
13 states:
4 State Flip-Flops needed
Write-back
R A fun B
ORi
Memory
Execute
R-type
decode / operand fetch
EECC550 - Shaaban
#17 Lec # 5 Winter 2012 12-18-2012
Traditional FSM Controller

next
state op cond state
Outputs (to datapath)
control points
Next State
Logic
Output
Logic
State Transition Table

Inputs
11
next
State
control points
Equal
6
Opcode
Current
State
State
op
Outputs (Control points)
To datapath
datapath State
State register (4 Flip-Flops)
EECC550 - Shaaban
#18 Lec # 5 Winter 2012 12-18-2012
Traditional FSM Controller

datapath + state diagram => control
Translate RTN statements into
control points.
Assign states.
Implement the controller.
More on FSM controller implementation in Appendix C
EECC550 - Shaaban
#19 Lec # 5 Winter 2012 12-18-2012
Mapping RTNs To Control Points Examples

& State Assignments
IR MEM[PC]
instruction fetch
0000
imem_rd, IRen
A R[rs]
B R[rt]
Aen, Ben
decode / operand fetch
0001
ALUfun, Sen
R-type
R A fun B
0100
BEQ & Zero
SW
BEQ & ~Zero
11
R A or ZX
R A + SX
0110
1000
R A + SX
M MEM[R]
1001
1011
R[rd] R
PC PC + 4
R[rt] R
PC PC + 4
0101
0111
To instruction fetch state 0000
0011
MEM[R] B
PC PC + 4
PC PC +
4+SX || 00
0010
state 0000
10
R[rt] M
PC PC + 4
1010
To instruction fetch state 0000
1100
PC PC + 4
12
RegDst,
RegWr,
PCen
LW
ORi
13 states:
4 State Flip-Flops needed
Write-back
Memory
Execute
EECC550 - Shaaban
#20 Lec # 5 Winter 2012 12-18-2012
Detailed Control Specification (Partial) State Transition Table

Current
Op field Z
Next IR
??????
BEQ
BEQ
R-type
orI
LW
SW
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
0001 1
0011
0010
0100
0110
1000
1011
0000
1
0000
1
0101
0000
1
0111
0000
1
1001
1010
0000
1
1100
0000
1
State
IF
ID
BEQ
ORI
LW
SW
0000
0001
0001
0001
0001
0001
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
?
0
1
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
PC
en sel
Ops
AB
Exec
Ex Sr ALU S
Mem
RWM
Write-Back
M-R Wr Dst
11
11
11
11
11
11
1
0
Can be combined in one state

0 1 fun
0
0 0 or
0
1 0 add 1
1 0 1
0
1 0 add 1
0
0 1
EECC550 - Shaaban
#21 Lec # 5 Winter 2012 12-18-2012
Alternative Multiple Cycle Datapath (In Textbook)

Minimizes Hardware: 1 memory, 1 ALU
PCWrCond
Zero
IorD
MemWr
IRWr
PCWr
ALUSrcA 1
RegWr
Mux
RegDst
32
PC
32
Din Dout
32
MemRd
Ra
busA A
Rb
Rd
busW busB
1
1 Mux 0
Imm 16
Extend
32
32
1
32
0
1
2
3
32
32
ALU
Control
<< 2
ALUOp
MemtoReg
3rd Edition Chapter 5.5

(see handout) Not in 4th Edition
Zero
32
Reg File
Rw
ALU Out
32
Rt 0
Mux
32 Rt
Mem Data Reg
Ideal
Memory
Rs
32
ALU
Address
PC
Mux
Mux
32
Instruction Reg
32
32
PCSrc
ALUSrcB
EECC550 - Shaaban
#22 Lec # 5 Winter 2012 12-18-2012
Alternative Multiple Cycle Datapath (In Textbook)

IorD
PC
0
M
u
x
1
MemRead
MemWrite
Instruction
[2521]
Address
Memory
MemData
Write
data
IRWrite
Instruction
[2016]
Instruction
[150]
Instruction
register
Instruction
[150]
Memory
data
register
RegDst
ALUSrcA
RegWrite
rs
0
M
u
x
1
Read
register 1
Read
Read
data 1
register 2
Registers
Write
Read
register
data 2
rt
0
M
Instruction u
x
[1511]
1
16
Sign
extend
32
Shift
left 2
Zero
ALU
ALU
result
ALUOut
0
4
Write
data
rd
0
M
u
x
1
1 M
u
2 x
3
ALU
control
imm16
i.e MDR
Instruction [50]
MemtoReg
ALUSrcB ALUOp
Shared instruction/data memory unit

A single ALU shared among instructions
Shared units require additional or widened multiplexors
Temporary registers to hold data between clock cycles of the instruction:
Additional registers:
Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut
(Figure 5.27 page 322)
EECC550 - Shaaban
#23 Lec # 5 Winter 2012 12-18-2012
Alternative Multiple Cycle Datapath With Control Lines

(Fig 5.28 In Textbook)
32
2
PC+ 4
PC
32
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
32
(ORI not supported, Jump supported)

EECC550 - Shaaban
#24 Lec # 5 Winter 2012 12-18-2012
The Effect of The 1-bit Control Signals

Signal
Name
Effect when deasserted (=0)
Effect when asserted (=1)
RegDst
The register destination number for the

write register comes from the rt field
(instruction bits 20:16).
RegWrite
None
The register destination number for the

write register comes from the rd field
(instruction bits 15:11).
The register on the write register input
is written with the value on the Write
data input.
ALUSrcA
The first ALU operand is the PC
The First ALU operand is register A (i.e R[rs])
MemRead
None
MemWrite
None
Content of memory specified by the address input

are put on the memory data output.
Memory contents specified by the address input is
replaced by the value on the Write data input.
MemtoReg
The value fed to the register write data

input comes from ALUOut register.
The value fed to the register write data

input comes from data memory register (MDR).
IorD
The PC is used to supply the address to the

memory unit.
The ALUOut register is used to supply the the

address to the memory unit.
IRWrite
None
The output of the memory is written into

Instruction Register (IR)
PCWrite
None
The PC is written; the source is controlled by

PCSource
PCWriteCond None
i.e. Branch
The PC is written if the Zero output of the ALU is

also active.
EECC550 - Shaaban
#25 Lec # 5 Winter 2012 12-18-2012
The Effect of The 2-bit Control Signals

Signal
Name
Effect
Value (Binary)
00
The ALU performs an add operation
01
The ALU performs a subtract operation
10
The funct field of the instruction determines the ALU

operation (R-Type)
00
The second input of the ALU comes from register B (i.e R[rs])
01
The second input of the ALU is the constant 4
ALUOp
ALUSrcB
10
11
00
The second input of the ALU is the sign-extended 16-bit

immediate (imm16) field of the instruction in IR
The second input of the ALU is is the sign-extended 16-bit
immediate field of IR shifted left 2 bits (for branches)
Output of the ALU (PC+4) is sent to the PC for writing
01
The content of ALUOut (the branch target address) is sent

to the PC for writing
10
The jump target address (IR[25:0] shifted left 2 bits and

concatenated with PC+4[31:28] is sent to the PC for writing
PCSource
i.e jump address
EECC550 - Shaaban
#26 Lec # 5 Winter 2012 12-18-2012
Operations (Dependant RTN) for Each Cycle

R-Type
IF
ID
EX
Instruction
Fetch
Instruction
Decode
Execution
IR Mem[PC]
PC PC + 4
WB
Store
IR Mem[PC]
PC PC + 4
IR Mem[PC]
PC PC + 4
Branch
IR Mem[PC]
PC PC + 4
Jump
IR Mem[PC]
PC PC + 4
A R[rs]
A R[rs]
R[rs]
R[rs]
R[rs]
R[rt]
R[rt]
R[rt]
R[rt]
R[rt]
ALUout PC +
(SignExt(imm16)
x4)
ALUout PC +
ALUout
ALUout
A funct B
MEM
Load
(SignExt(imm16) x4)
ALUout PC +
ALUout PC +
(SignExt(imm16) x4)
Zero A - B
ALUout
A + SignEx(Imm16)
(SignExt(imm16) x4)
A + SignEx(Imm16)
ALUout PC +
(SignExt(imm16) x4)
PC Jump Address
Zero: PC ALUout
Memory
MDR Mem[ALUout]
Write
Back
R[rd] ALUout
R[rt]
Mem[ALUout]
MDR
Instruction Fetch (IF) & Instruction Decode (ID) cycles

are common for all instructions
EECC550 - Shaaban
#27 Lec # 5 Winter 2012 12-18-2012
High-Level View of Finite State

Machine Control
(Figure 5.32)
2-5
6-7
(Figure 5.33)
(Figure 5.34)
0-1
9
(Figure 5.35)
(Figure 5.36)
First steps are independent of the instruction class

Then a series of sequences that depend on the instruction opcode
Then the control returns to fetch a new instruction.
Each box above represents one or several state.
EECC550 - Shaaban
#28 Lec # 5 Winter 2012 12-18-2012
FSM State Transition

Diagram (From Book)
IF
A R[rs]
ID
R[rt]
ALUout PC +
(SignExt(imm16) x4)
IR Mem[PC]
PC PC + 4
ALUout
A + SignEx(Imm16)
PC Jump Address
EX
ALUout A func B
Zero A -B
Zero: PC ALUout
MDR Mem[ALUout]
WB
MEM
R[rd] ALUout
Mem[ALUout] B
Total 10 states
R[rt]
MDR
WB
EECC550 - Shaaban
#29 Lec # 5 Winter 2012 12-18-2012
Instruction Fetch (IF) and Decode (ID)

FSM States
A
R[rs]
R[rt]
ALUout PC + (SignExt(imm16) x4)
IF
IR Mem[PC]
PC PC + 4
(Figure 5.33)
(Figure 5.34)
ID
(Figure 5.35)
(Figure 5.36)
EECC550 - Shaaban
#30 Lec # 5 Winter 2012 12-18-2012
Instruction Fetch (IF) Cycle (State 0)

IR Mem[PC]
PC PC + 4
MemRead = 1
ALUSrcA = 0
ALUSrcB = 01 ALUOp = 00 (add)
IorD = 0
PCWrite = 1
IRWrite =1
PCSource = 00
32
00
2
1
01
PC+ 4
PC
32
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
00
Add
32

EECC550 - Shaaban
#31 Lec # 5 Winter 2012 12-18-2012
Instruction Decode (ID) Cycle (State 1)

A
R[rs]
R[rt]
ALUSrcA = 0
ALUout PC + (SignExt(imm16) x4)
ALUSrcB = 11
ALUOp = 00 (add)
(Calculate branch target)
32
2
11
PC
32
PC+ 4
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
00
Add
32

EECC550 - Shaaban
#32 Lec # 5 Winter 2012 12-18-2012
Load/Store Instructions FSM States

(From Instruction Decode)
ALUout A + SignEx(Imm16)
EX
i.e Effective address calculation
MDR Mem[ALUout]
MEM
R[rt]
Mem[ALUout] B
MDR
WB
To Instruction Fetch
(Figure 5.32)
EECC550 - Shaaban
#33 Lec # 5 Winter 2012 12-18-2012
Load/Store Execution (EX) Cycle (State 2)

Effective address calculation
ALUout A + SignEx(Imm16)
ALUSrcA = 1
ALUOp = 00 (add)
ALUSrcB = 10
32
2
10
PC
32
PC+ 4
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
00
Add
32

EECC550 - Shaaban
#34 Lec # 5 Winter 2012 12-18-2012
Load Memory (MEM) Cycle (State 3)

MDR Mem[ALUout]
MemRead = 1
IorD = 1
32
2
1
PC+ 4
PC
32
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
32

EECC550 - Shaaban
#35 Lec # 5 Winter 2012 12-18-2012
Load Write Back (WB) Cycle (State 4)

R[rt]
MDR
RegWrite = 1
MemtoReg = 1
RegDst = 0
32
2
PC+ 4
PC
32
32
0
32
rs
Branch
Target
rt
rd
32
32
2
1
imm16
32

EECC550 - Shaaban
#36 Lec # 5 Winter 2012 12-18-2012
Store Memory (MEM) Cycle (State 5)

Mem[ALUout] B
MemWrite = 1
IorD = 1
32
2
1
PC+ 4
PC
32
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
32

EECC550 - Shaaban
#37 Lec # 5 Winter 2012 12-18-2012
R-Type Instructions
FSM States
EX
ALUout A funct B
WB
R[rd] ALUout
To State 0 (Instruction Fetch)

(Figure 5.32)
EECC550 - Shaaban
#38 Lec # 5 Winter 2012 12-18-2012
R-Type Execution (EX) Cycle (State 6)

ALUout A funct B
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10 (R-Type)
32
2
00
PC
32
PC+ 4
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
10
R-Type
32

EECC550 - Shaaban
#39 Lec # 5 Winter 2012 12-18-2012
R-Type Write Back (WB) Cycle (State 7)

R[rd] ALUout
RegWrite = 1
MemtoReg = 0
RegDst = 1
32
2
PC+ 4
PC
32
32
1
32
rs
Branch
Target
rt
rd
32
32
2
0
imm16
32

EECC550 - Shaaban
#40 Lec # 5 Winter 2012 12-18-2012
Jump Instruction
Single EX State
Branch Instruction
Single EX State
Zero A - B
PC Jump Address
Zero : PC ALUout
EX
EX

(Figure 5.32)
(Figures 5.35, 5.36 page 337)

(Figure 5.32)
EECC550 - Shaaban
#41 Lec # 5 Winter 2012 12-18-2012
Branch Execution (EX) Cycle (State 8)

Zero A - B
Zero : PC ALUout
ALUSrcA = 1
PCWriteCond = 1
ALUSrcB = 00
PCSource = 01
ALUOp = 01 (Subtract)
32
01
2
00
PC
32
PC+ 4
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
01
Subtract
32

EECC550 - Shaaban
#42 Lec # 5 Winter 2012 12-18-2012
Jump Execution (EX) Cycle (State 9)

PC Jump Address
PCWrite = 1
PCSource = 10
32
10
PC+ 4
PC
32
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
32

EECC550 - Shaaban
#43 Lec # 5 Winter 2012 12-18-2012
MIPS Multi-cycle Datapath

Performance Evaluation
1 CPI 5
What is the average CPI?

State diagram gives CPI for each instruction type.
Workload (program) below gives frequency of each type.
Type
CPIi for type
Frequency
CPIi x freqIi
Arith/Logic
40%
1.6
Load
30%
1.5
Store
10%
0.4
branch
20%
0.6
Average CPI:
4.1
Better than CPI = 5 if all instructions took the same number

of clock cycles (5).
C = 2 ns f = 500 MHz
T = I x CPI x C
EECC550 - Shaaban
#44 Lec # 5 Winter 2012 12-18-2012
Adding Support for swap to Multi Cycle Datapath

You are to add support for a new instruction, swap that
exchanges the values of two registers to the MIPS multicycle
datapath of Figure 5.28 on page 232
i.e. R[rt] R[rs]
swap $rs, $rt
R[rs] R[rt]
Swap used the R-Type format with:
the value of field rs = the value of field rd
Add any necessary datapaths and control signals to the
multicycle datapath. Find a solution that minimizes the
number of clock cycles required for the new instruction without
modifying the register file. Justify the need for the
modifications, if any.
i.e No additional register write ports
Show the necessary modifications to the multicycle control
finite state machine of Figure 5.38 on page 339 when adding
the swap instruction. For each new state added, provide the
dependent RTN and active control signal values.
EECC550 - Shaaban
#45 Lec # 5 Winter 2012 12-18-2012
Adding swap Instruction Support to Multi Cycle Datapath

Swap $rs, $rt
R[rt] R[rs]
We assume here rs = rd in instruction encoding
op
R[rs] R[rt]
rs rt
[31-26] [25-21]
[20-16]
rd
[10-6]
2
PC+ 4
rs
R[rs]
rt
Branch
Target
R[rt]
rd
2
3
imm16
The outputs of A and B should be connected to the multiplexor controlled by MemtoReg if one of the two fields
(rs and rd) contains the name of one of the registers being swapped. The other register is specified by rt.
The MemtoReg control signal becomes two bits.
EECC550 - Shaaban
#46 Lec # 5 Winter 2012 12-18-2012
Adding swap Instruction Support to Multi Cycle Datapath

IF
A R[rs]
IR Mem[PC]
PC PC + 4
ID
R[rt]
ALUout PC +
(SignExt(imm16) x4)
EX
ALUout
A + SignEx(Imm16)
WB1
R[rd] B
rd = rs
ALUout A func B
Zero A -B
Zero: PC ALUout
WB2
R[rt] A
R[rd] ALUout
A has R[rs]
MEM
WB
Swap takes 4 cycles
WB
EECC550 - Shaaban
#47 Lec # 5 Winter 2012 12-18-2012
Adding Support for add3 to Multi Cycle Datapath
You are to add support for a new instruction, add3, that adds the values of
three registers, to the MIPS multicycle datapath of Figure 5.28 on page 232
For example:
add3 $s0,$s1, $s2, $s3
Register $s0 gets the sum of $s1, $s2 and $s3.
The instruction encoding uses a modified R-format, with an additional register
specifier rx added replacing the five low bits of the funct field.
6 bits
[31-26]
5 bits
[25-21]
5 bits
[20-16]
5 bits
[15-11]
OP
rs
rt
rd
add3
$s1
$s2
$s0
6 bits
[10-5]
5 bits
[4-0]
rx
Not used
$s3
Add necessary datapath components, connections, and control signals to the multicycle
datapath without modifying the register bank or adding additional ALUs. Find a solution
that minimizes the number of clock cycles required for the new instruction. Justify the
need for the modifications, if any.
Show the necessary modifications to the multicycle control finite state machine of Figure
5.38 on page 339 when adding the add3 instruction. For each new state added, provide
the dependent RTN and active control signal values.
EECC550 - Shaaban
#48 Lec # 5 Winter 2012 12-18-2012
add3 instruction support to Multi Cycle Datapath

Add3 $rd, $rs, $rt, $rx
rx is a new register specifier in field [0-4] of the instruction

No additional register read ports or ALUs allowed
R[rd] R[rs] + R[rt] + R[rx]
Modified
R-Format
op
rs rt
[31-26] [25-21]
[20-16]
rd
rx
[10-6]
[4-0]
2
WriteB
Re adSrc
rs
rt
PC+ 4
Branch
Target
rx
rd
imm16
1. ALUout is added as an extra input to first ALU operand MUX to use the previous ALU result as an input for the second addition.
2. A multiplexor should be added to select between rt and the new field rx containing register number of the 3rd operand
(bits 4-0 for the instruction) for input for Read Register 2.
This multiplexor will be controlled by a new one bit control signal called ReadSrc.
3. WriteB control line added to enable writing R[rx] to B
EECC550 - Shaaban
#49 Lec # 5 Winter 2012 12-18-2012
add3 instruction support to Multi Cycle Datapath

IF
A R[rs]
IR Mem[PC]
PC PC + 4
ID
R[rt]
ALUout PC +
(SignExt(imm16) x4)
EX
ALUout
WriteB
A + SignEx(Im16)
EX1
ALUout A + B
WriteB
B R[rx]
ALUout A func B
Zero A -B
Zero: PC ALUout
EX2
ALUout ALUout + B
R[rd] ALUout
MEM
WB
Add3 takes 5 cycles
WB
EECC550 - Shaaban
#50 Lec # 5 Winter 2012 12-18-2012

Mips Isa Eecc550

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mips Isa Eecc550

Uploaded by

Copyright:

Available Formats

Major CPU Design Steps

1. Analyze instruction set operations using independent RTN

2. Select required datapath components, connections &

Determine number of cycles per instruction and operations in each cycle.

3. Assemble datapath meeting the requirements.

5. Design & assemble the control logic.

3rd Edition Chapter 5.5 See Handout Not in 4th Edition

Single Cycle MIPS Datapath:

CPI = 1, Long Clock Cycle

Jump Not Included

Single Cycle MIPS Datapath Extended To Handle Jump with

Jump address [310]

4th Edition Figure 4.24 page 329

Drawbacks of Single-Cycle Processor

All instructions must take as much time as the slowest:

Real memory is not as well-behaved as idealized memory

Cannot always complete data access in one (short) cycle.

2. Impossible to implement complex, variable-length instructions and

e.g indirect memory addressing.

3. High and duplicate hardware resource requirements

Abstract View of Single Cycle CPU

Critical Path = C = 8ns (LW)

One CPU Clock Cycle

One instruction per cycle CPI = 1

Single Cycle Instruction Timing

(Determines CPU clock cycle, C)

Critical Path: Load - LW (e.g C = 8 ns)

Clock Cycle Time & Critical Path

i.e longest delay

Reducing Cycle Time: Multi-Cycle Design

Cut combinational dependency graph by inserting registers / latches.

Basic MIPS Instruction Processing Steps

Obtain instruction from program storage

Determine instruction type

Obtain operands from registers

Compute result value or status

Store result in register/memory if needed

(usually called Write Back).

Partitioning The Single Cycle Datapath

Place registers to:

Example Multi-cycle Datapath

Thus Clock Rate:

(ignoring MUX, CLK-Q delays)

Operations (Dependant RTN) for Each Cycle

Instruction Fetch (IF) & Instruction Decode cycles

MIPS Multi-Cycle Datapath:

Five Cycles of Load

Cycle 3 Cycle 4 Cycle 5

1- Instruction Fetch (IF):

Multi-cycle Datapath Instruction CPI

IF, ID, EX, WB

Loads: Require five cycles, CPI = 5

IF, ID, EX, MEM, WB

Stores: Require four cycles, CPI = 4

Branches/Jumps: Require three cycles, CPI = 3

Average or effective program CPI:

Single Cycle Vs. Multi-Cycle CPU

8ns (125 MHz)

Single Cycle Implementation:

2ns (500 MHz)

Control Unit Design:

Finite State Machine (FSM) Control Model

State Transition Depends

Control Specification For Multi-cycle CPU