5 Datapath Control 2

PROCESSOR:
DATAPATH & CONTROL - 2

Dr. Bill Yi
Santa Clara University
(Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design:
The Hardware/Software Interface, 3rd Ed., Morgan Kaufmann, 2007)
(Also based on presentation: Dr. Nam Ling, COEN210 Lecture Notes)
1
COURSE CONTENTS
Introduction
Instructions
Computer Arithmetic
Performance
Î Processor: Datapath
Î Processor: Control
Pipelining Techniques
Memory
Input/Output Devices
2
PROCESSOR:
DATAPATH & CONTROL
Multi-Cycle Datapath & Control

Control: Finite State Machine (FSM)
Control: Microprogramming
3
Multicycle Approach
Break up an instruction into steps, each step takes a cycle:
balance the amount of work to be done
restrict each cycle to use only one major functional unit
Different instructions take different number of cycles to complete
At the end of a cycle:
store values for use in later cycles (easiest thing to do)
introduce additional “internal” registers for such temporal
storage
Reusing functional units (reduces hardware cost):

Use ALU to compute address/result and to increment PC
Use memory for both instructions and data
4
Multi-Cycle Datapath:
Additional Registers
Additional “internal registers”:
Î Instruction register (IR) -- to hold current instruction
Î Memory data register (MDR) -- to hold data read from memory
Î A register (A) & B register (B) -- to hold register operand values from register files
Î ALUOut register (ALUOut) -- to hold output of ALU, also serves as memory address register
(MAR)
All registers except IR hold data only between a pair of adjacent cycles and thus do not need write
control signals; IR holds instructions till end of instruction, hence needs a write control signal
PC 0 0
M Instruction Read
Address [25– 21] register 1 M
u u
x Read x
Instruction Read A Zero
1 Memory [20– 16] register 2 data 1 1
Inst /Data 0 ALU ALU ALUOut
Registers
Instruction M Write Read result
[15– 0] Instruction u register data 2 B 0
Write Instruction [15– 11] x 4 1 M
data 1 Write u
register data 2 x
Instruction 0 3
[15– 0] M
u
x
Memory 1
data 16 32
Sign Shift
register
extend left 2
Note: we ignore jump inst here

5
Multicycle Datapath:
Additional Multiplexors
Additional multiplexors:
Î Mux for first ALU input -- to select A or PC (since we use ALU for both address/result
computation & PC increment)
Î Bigger mux for second ALU input -- due to two additional inputs: 4 (for normal PC increment)
and the sign-extended & shifted offset field (in branch address computation)
Î Mux for memory address input -- to select instruction address or data address
PC 0 0
M Instruction Read
Address [25– 21] register 1 M
u u
x Read x
Instruction Read A Zero
1 Memory [20– 16] register 2 data 1 1
Inst /Data 0 ALU ALU ALUOut
Registers
Instruction M Write Read result
[15– 0] Instruction u register data 2 B 0
Write Instruction [15– 11] x 4 1 M
data 1 Write u
register data 2 x
Instruction 0 3
[15– 0] M
u
x
Memory 1
data 16 32
register Sign Shift
extend left 2
Note: we ignore jump inst here 6

Multi-Cycle
Datapath & Control
2
2
Note the reason for each control signal; also note that we have included the jump instruction
7
Control Signals for
Multi-Cycle Datapath
Note:
Î three possible sources for value to be written into PC (controlled by
PCSource): (1) regular increment of PC, (2) conditional branch target from
ALUOut, (3) unconditional jump (lower 26 bits of instruction in IR shifted
left by 2 and concatenated with upper 4 bits of the incremented PC)
Î two PC write control signals: (1) PCWrite (for unconditional jump), & (2)
PCWriteCond (for “zero” signal to cause a PC write if asserted during beq
inst.)
Î since memory is used for both inst. & data, need IorD to select appropriate
addresses
Î IRWrite needed for IR so that instruction is written to IR (IRWrite = 1)
during the first cycle of the instruction and to ensure that IR not be
overwritten by another instruction during the later cycles of the current
instruction execution (by keeping IRWrite = 0)
Î other control signals
8
Breaking the Instruction
into 3 - 5 Execution Steps
1. Instruction Fetch (All instructions)
2. Instruction Decode (All instructions), Register Fetch & Branch Address
Computation (in advance, just in case)
3. ALU (R-type) execution, Memory Address Computation, or Branch
Completion (Instruction dependent)
4. Memory Access or R-type Instruction Completion (Instruction dependent)
5. Memory Read Completion (only for lw)
At end of every clock cycle, needed data must be stored into register(s) or memory
location(s).
Each step (can be several parallel operations) is 1 clock cycle --> Instructions take 3
to 5 cycles!
Events during a cycle, e.g.: Clock
Data ready operation Clock in result

9
Step 1: Instruction Fetch
Use PC to get instruction (from memory) and put it in the Instruction Register
Increment of the PC by 4 and put the result back in the PC
Can be described succinctly using RTL "Register-Transfer Language"
IR <= Memory[PC];
PC <= PC + 4;
Which control signals need to be asserted?

Î IorD = 0, MemRead = 1, IRWrite = 1
Î ALUSrcA = 0, ALUSrcB = 01, ALUOp = 00, PCWrite = 1, PCSource = 00
Why can instruction read & PC update be in the same step? Look at state element
timing
What is the advantage of updating the PC now?
10
Step 2: Instruction Decode, Reg.
Fetch, & Branch Addr. Comp.
In this step, we decode the instruction in IR (the opcode enters control

unit in order to generate control signals). In parallel, we can
Read registers rs and rt, just in case we need them
Compute the branch address, just in case the instruction is a branch beq
RTL:
A <= Reg[IR[25:21]];
B <= Reg[IR[20:16]];
ALUOut <= PC + (sign-extend(IR[15:0]) << 2);
Control signals:
Î ALUSrcA = 0, ALUSrcB = 11, ALUOp = 00 (add)
Î Note: no explicit control signals needed to write A, B, & ALUOut. They are
written by clock transitions automatically at end of step
11
Step 3: Instruction
Dependent Operation
One of four functions, based on instruction type:
Memory address computation (for lw, sw):

ALUOut <= A + sign-extend(IR[15:0]);
Î Control signals: ALUSrcA = 1, ALUSrcB = 10, ALUOp = 00
ALU (R-type):
ALUOut <= A op B;
Î Control signals: ALUSrcA = 1, ALUSrcB = 00, ALUOp = 10
Conditional branch:
if (A==B) PC <= ALUOut;
Î Control signals: ALUSrcA = 1, ALUSrcB = 00, ALUOp = 01 (Sub), PCSource = 01,
PCWriteCond = 1 (to enable zero to write PC if 1)
What is the content of ALUOut during this step? Immediately after this step?
Jump:
PC <= PC[31:28] || (IR[25:0]<<2);
Î Control signals: PCSource = 10, PCWrite = 1
Note: Conditional branch & jump instructions completed at this step!

12
Step 4: Memory Access or ALU
(R-type) Instruction Completion
For lw or sw instructions (access memory):
MDR <= Memory[ALUOut];

or
Memory[ALUOut] <= B;
Î Control signals (for lw): IorD = 1 (to select ALUOut as address), MemRead = 1, note
that no write signal needed for writing to MDR, it is written by clock transition automatically at end
of step
Î Control signals (for sw): IorD = 1 (to select ALUOut as address), MemWrite = 1
For ALU (R-type) instructions (write result to register):
Reg[IR[15:11]] <= ALUOut;
Î Control signals: RegDst = 1 (to select register address), MemtoReg = 0, RegWrite = 1
+ The write actually takes place at the end of the cycle on the clock edge!
Note: sw and ALU (R-type) instructions completed at this step!
13
Step 5: Memory Read
Completion
For lw instruction only (write data from MDR to register):
Reg[IR[20:16]]<= MDR;
Control signals: RegDst = 0 (to select register address), MemtoReg =

1, RegWrite = 1
Note: lw instruction completed at this step!
14
Summary of Execution Steps
Action for R-type Action for memory-reference Action for Action for
Step name instructions instructions branches jumps
Instruction fetch IR <= Memory[PC]
PC <= PC + 4
Instruction A <= Reg [IR[25:21]]
decode/register fetch B <= Reg [IR[20:16]]
/branch addr comp ALUOut <= PC + (sign-extend (IR[15:0]) << 2)
Execution, address ALUOut <= A op B ALUOut <= A + sign-extend if (A ==B) then PC <= PC [31:28]
computation, branch/ (IR[15:0]) PC <= ALUOut II (IR[25:0]<<2)
jump completion
Memory access or R-type Reg [IR[15:11]] <= Load: MDR <= Memory[ALUOut]
completion ALUOut or
Store: Memory [ALUOut] <= B
Memory read completion Load: Reg[IR[20:16]] <= MDR
Some instructions take shorter number of cycles, therefore next instructions can start earlier.
Hence, compare to single-cycle implementation where all instructions take same amount of time, multi-cycle
implementation is faster!
Multi-cycle implementation also reduces hardware cost (reduces adders & memory, increases number of
registers & muxes).
15
Simple Questions
How many cycles will it take to execute this code?
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume not
add $t5, $t2, $t3
sw $t5, 8($t3)
Label: ...
What is going on during the 8th cycle of execution?

In what cycle does the actual addition of $t2 and $t3 takes place?
16
Defining the Control for
Multi-Cycle Datapath
Multi-cycle vs single-cycle datapath:
for single-cycle, truth-tables to specify setting of control signals based on
instruction
for multi-cycle, control is more complex due to instruction is executed in steps;
control must specify both the control signals in any step & the next step in the
sequence
Value of control signals dependent upon:
what instruction is being executed
which step is being performed
Two different control techniques:

Î Finite state machine (FSM)
Î Microprogramming
Implementation can be derived from specification
17
Finite State Machine
(FSM) Control
Consists of set of states & directions on how to change

states
Each state specifies a set of control signal outputs that are
asserted when machine is at that state
Each state in FSM takes 1 clock cycle
First two states (state 0 & state 1) common for all
instructions
After state 1, signals asserted depend on instruction (this
process is called instruction decoding)
After last step (state) of an instruction, FSM returns to state
0 to begin fetching next instruction
18
The Complete FSM Control
Instruction decode/
Instruction fetch register fetch
0
MemRead 1
Graphical specification: ALUSrcA = 0
IorD = 0 ALUSrcA = 0
Start IRWrite ALUSrcB = 11
ALUSrcB = 01 ALUOp = 00
ALUOp = 00
PCWrite
PCSource = 00
e)
')
-t yp
EQ
(Op = 'J')
=R
'B
(O p
=
Memory address W ')
p
p = 'S Branch
(O
computation O Jump
r ( Execution completion
') o completion
= 'LW
2 (Op 6 8 9
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA =1 ALUSrcB = 00
ALUSrcB = 10 ALUOp = 01 PCWrite
ALUSrcB = 00 PCSource = 10
ALUOp = 00 ALUOp = 10 PCWriteCond
PCSource = 01
(O
(Op = 'LW')
p
=
'S
W
')
Memory Memory
access access R-type completion
3 5 7
RegDst = 1
MemRead MemWrite RegWrite
IorD = 1 IorD = 1 MemtoReg = 0
Write-back step
4
RegDst = 0
RegWrite
MemtoReg = 1
19
CPI in Multi-Cycle CPU
Example:
load store R-type branch jumps
(cond.)
gcc instruction 22% 11% 49% 16% 2%
mix
#cycles 5 4 4 3 3
CPI = 0.22 x 5 + 0.11 x 4 + 0.49 x 4 + 0.16 x 3 + 0.02 x 3

= 1.1 + 0.44 + 1.96 + 0.48 + 0.06 = 4.04
Better than worst case CPI (if all instructions took same number of cycles = 5)
20
FSM Controller
Implementation
Typically by a block of combinational logic & a state register to hold the current state
PCWrite
Total of 9 states --> 4 bit state register PCWriteCond
Combinational control logic: IorD
MemRead
Inputs: current state & any input used to MemWrite
determine the next state (in this case is 6-bit IRWrite
Combinational MemtoReg
opcode) control logic
PCSource
Outputs: next state number & control ALUOp
signals to be asserted for current state Outputs ALUSrcB
Note: here outputs depend only on current ALUSrcA
RegWrite
state, not on inputs (Moore machine)
RegDst
NS3
NS2
NS1
Inputs NS0
Op1
Op0
Op5
Op4
Op3
Op2
S3
S2
S1
S0
Instruction register State register
opcode field
21
PLA Implementation of the
Combinational Control Logic
Op5
If I picked a horizontal or a
Op4
vertical line, could you explain it?
Op3
Note: upper half is AND plane &
Op2
lower half is OR plane
Op1
Example: PCWrite = 1 if (current state is Op0
state 0) or (current state is state 9), i.e.,
S3
S2
PCWrite = S 3 ⋅ S 2 ⋅ S1 ⋅ S 0 + S 3 ⋅ S 2 ⋅ S1 ⋅ S 0 S1
S0
Example: next state bit 2 NS2 = 1 (i.e. states
4, 5, 6, or 7) if (current state is 3) or (current PCWrite
PCWriteCond
state is 2 and op = 101011 (sw)) or (current IorD
state is 1 and op = 000000 (R-type)) or MemRead
MemWrite
(current state is 6), I.e. IRWrite
MemtoReg
PCSource1
PCSource0
ALUOp1
NS 2 = S 3 ⋅ S 2 ⋅ S1⋅ S 0 + ALUOp0
ALUSrcB1
S 3 ⋅ S 2 ⋅ S1⋅ S 0 ⋅ Op5 ⋅ Op 4 ⋅ Op3 ⋅ Op 2 ⋅ Op1 ⋅ Op0 + ALUSrcB0
ALUSrcA
RegWrite
S 3 ⋅ S 2 ⋅ S1⋅ S 0 ⋅ Op5 ⋅ Op 4 ⋅ Op3 ⋅ Op 2 ⋅ Op1⋅ Op0 + RegDst
NS3
S 3 ⋅ S 2 ⋅ S1 ⋅ S 0 NS2
NS1
NS0 22
ROM Implementation of
Combinational control logic can be express in a truth table: inputs are current
state values (S3 - S0) & Opcodes (Op5 - Op0); outputs are control signals &
next state values (NS3 - NS0)
A ROM can be used to implement a truth table
if the address (inputs) is m-bits, we can address 2m entries in the ROM
outputs are the bits of data that the address points to
Example: address data

0 0 0 0 0 1 1
0 0 1 1 1 0 0
m n
0 1 0 1 1 0 0
ROM 0 1 1 1 0 0 0
1 0 0 0 0 0 0
1 0 1 0 0 0 1
1 1 0 0 1 1 0
1 1 1 0 1 1 1
23
ROM Implementation of
How many inputs are there?
6 bits for opcode, 4 bits for current-state = 10 address lines
(i.e., 210 = 1024 different addresses)
How many outputs are there?
16 datapath-control outputs, 4 next-state bits = 20 bit outputs
ROM is 210 x 20 = 20K bits (and a rather unusual size)
Rather wasteful, since lots of input combinations (addresses) will never

occur — e.g. many opcodes are illegal, some states (e.g. states 10 to
15) are illegal
24
ROM vs. PLA
Break up the table into two parts
— 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM
— 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM + small circuit
— Total: 4.3K bits of ROM + small circuit
PLA is much smaller
— can share product terms
— only need entries that produce an active output
— can take into account don't cares
Size is (#inputs × #product-terms) + (#outputs × #product-terms)
For this example, PLA size prop. to = (10x17)+(20x17) = 510 PLA cells
PLA cells usually about (slightly bigger) the size of a ROM cell (bit)
PLA is a much more efficient implementation for this control unit
25
Microprogramming Control
If the assembly language instruction set becomes very large, FSM could require
hundreds to thousands of states & many arcs (sequences) -- very complex
Î Complex control better managed by microprogramming
Basic idea:
All control signals in a cycle form a microinstruction, each microinst. defines:
the set of datapath control signals that must be asserted in a given state (cycle)
next microinstruction
Executing a microinstruction = asserting the control signals specified
A sequence of microinstructions form a microprogram
Each cycle, a microinstruction is fetched from the microprogram & executed
Microprogramming -- designing the control as a program implementing machine
instructions by simpler microinstructions
Each control state corresponds to a microinstruction
Our basic FSM: 10 states → 10 micro-instructions
26
Microinstruction Format
A microinstruction contains several fields + 1 label
Each field specifies a non-overlapping set of control signals
Signals that are never asserted simultaneously may share the same field
A last field specifies how to choose the next microinstruction
Label: some micro-instructions have a label to be branched at
In our example, we have 7 fields + 1 label
1st to 6th fields: control specification; 7th field: next instruction
Field name Control signals

1. ALU control Define operation of ALU
2. SRC1 Specify source for 1st ALU operand
3. SRC2 Specify source for 2nd ALU operand
4. Register control Specify read or write for register file, and source of
value for a write
5. Memory Specify read or write, and the source for memory.
For a read, specify destination register
6. PCWrite control Specify the writing of PC
7. Sequencing Specify how to choose next microinstruction
27
A Microprogram
Control Unit
Microinstructions are placed in a
PCWrite
ROM or PLA Control unit
PCWriteCond
IorD
The state (in state register) enters MemRead
PLA or ROM
as input or address to define the MemWrite
IRWrite
current microinstruction, which in BWrite
MemtoReg
turn asserting relevant control Outputs
PCSource 2
signals ALUOp
ALUSrcB
2
2
State change at the edge of clock ALUSrcA

RegWrite
Sequencing: ways to choose next RegDst
AddrCtl
Input
microinstruction (next state): 1
Î increment current address/state
(AddrCtl selects +1 adder) (Seq) State
Adder
Î branch to microinstruction that
begins execution of the next MIPS Address select logic
instruction (AddrCtl selects address
Op[5–0]
0) (Fetch)
Î choose next microinstruction based
on opcode (AddrCtl selects dispatch Instruction register
table) (Dispatch) opcode field
28
A Review of Our
State Diagram
Instruction decode/
Instruction fetch register fetch
0
Graphical specification:
MemRead 1
ALUSrcA = 0
IorD = 0 ALUSrcA = 0
Start IRWrite ALUSrcB = 11
ALUSrcB = 01 ALUOp = 00
ALUOp = 00
PCWrite
PCSource = 00
e)
)
-t yp
Q'
(Op = 'J')
E
=R
'B
(Op
=
Memory address W ')
p
p = 'S Branch
(O
computation O Jump
(
') or Execution completion completion
= 'LW
2 (Op 6 8 9
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA =1 ALUSrcB = 00
ALUSrcB = 10 ALUOp = 01 PCWrite
ALUSrcB = 00 PCSource = 10
ALUOp = 00 ALUOp = 10 PCWriteCond
PCSource = 01
(O
(Op = 'LW')
p
=
'S
W
')
Memory Memory
access access R-type completion
3 5 7
RegDst = 1
MemRead MemWrite RegWrite
IorD = 1 IorD = 1 MemtoReg = 0
Write-back step
4
RegDst = 0
RegWrite
MemtoReg = 1
29
Sequencing:
Address Select Logic
PLA or ROM
Dispatch ROM 1 1
Op Opcode name Value
State
000000 R-format 0110
000010 jmp 1001 Adder
000100 beq 1000 Mux AddrCtl
100011 lw 0010 3 2 1 0
101011 sw 0010
0
Dispatch ROM 2 Dispatch ROM 2 Dispatch ROM 1

Op Opcode name Value
100011 lw 0011 Address select logic
Op
101011 sw 0101
Instruction register
opcode field
State number Address-control action Value of AddrCtl

0 Use incremented state 3
1 Use dispatch ROM 1 1
2 Use dispatch ROM 2 2
4 Replace state number by 0 0
9 Replace state number by 0 0 30
A Microprogram
Control Unit
Control unit PCWrite
A microprogram PCWriteCond
IorD
control unit MemRead
Microcode memory Datapath
controlling the MemWrite
IRWrite
datapath BWrite
Î ROM or PLA is now Outputs MemtoReg
PCSource
microcode memory ALUOp
(control memory) ALUSrcB
ALUSrcA
Î state register is RegWrite
RegDst
now microprogram AddrCtl
Input
counter (µPC)
1
Microprogram counter
Microcode Adder
storage Address select logic
Op[5–0]
Sequencer
Instruction register
opcode field
31
A Review of
Datapath & Control
2
2
32
Note the reason for each control signal; also note that we have included the jump instruction
A Review of the Instruction
Execution Steps
1. IR <= Memory[PC]; PC <= PC + 4; (State 0)
2. Instruction Decode (All instructions);

A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; (State 1)
ALUOut <= PC + (sign-extend(IR[15:0]) << 2);
3. Memory address computation (for lw, sw):

ALUOut <= A + sign-extend(IR[15:0]); (State 2)
ALU (R-type): ALUOut <= A op B; (State 6)
Conditional branch: if (A==B) then PC <= ALUOut; (State 8)
Jump: PC <= PC[31:28] || (IR[25:0]<<2); (State 9)
4. For lw or sw instructions (access memory):

MDR <= Memory[ALUOut] (State 3) or Memory[ALUOut] <= B; (State 5)
For ALU (R-type) instructions (write result to register): Reg[IR[15:11]] <= ALUOut; (State 7)
5. For lw instruction only (write data from MDR to register): Reg[IR[20:16]]<= MDR; (State 4)
33
A Symbolic Microprogram
A specification methodology
appropriate if hundreds of opcodes, modes, cycles, etc.
signals specified symbolically using microinstructions
E.g. Read PC = Read memory using PC as address and write result into IR (& MDR) (see
next slide for details)
Our symbolic microprogram with 10 microinstructions:
ALU Register PCWrite
Label control SRC1 SRC2 control Memory control Sequencing
Fetch Add PC 4 Read PC ALU Seq
Add PC Extshft Read Dispatch 1
Mem1 Add A Extend Dispatch 2
LW2 Read ALU Seq
Write MDR Fetch
SW2 Write ALU Fetch
Rformat1 Func code A B Seq
Write ALU Fetch
BEQ1 Subt A B ALUOut-cond Fetch
JUMP1 Jump address Fetch
Microassembler: performs checks to remove combinations that cannot be supported in datapath
34
Control Signals for Each Symbol
in Each Field in the Microprogram
Field name Value Signals active Comment
Add ALUOp = 00 Cause the ALU to add.
ALU control Subt ALUOp = 01 Cause the ALU to subtract; this implements the compare for
branches.
Func code ALUOp = 10 Use the instruction's function code to determine ALU control.
SRC1 PC ALUSrcA =0 Use the PC as the first ALU input.
A ALUSrcA =1 Register A is the first ALU input.
B ALUSrcB = 00 Register B is the second ALU input.
SRC2 4 ALUSrcB = 01 Use 4 as the second ALU input.
Extend ALUSrcB = 10 Use output of the sign extension unit as the second ALU input.
Extshft ALUSrcB = 11 Use the output of the shift-by-two unit as the second ALU input.
Read Read two registers using the rs and rt fields of the IR as the register
numbers and putting the data into registers A and B.
Write ALU RegWrite = 1, Write a register using the rd field of the IR as the register number and
Register RegDst = 1, the contents of the ALUOut as the data.
control MemtoReg = 0
Write MDR RegWrite = 1, Write a register using the rt field of the IR as the register number and
RegDst = 0, the contents of the MDR as the data.
MemtoReg = 1
Read PC MemRead = 1, Read memory using the PC as address; write result into IR (and
IorD = 0, IRWrite=1 the MDR).
Memory Read ALU MemRead = 1, Read memory using the ALUOut as address; write result into MDR.
lorD = 1
Write ALU MemWrite = 1, Write memory using the ALUOut as address, contents of B as the
lorD = 1 data.
ALU PCSource = 00 Write the output of the ALU into the PC.
PCWrite = 1
PC write control ALUOut-cond PCSource = 01, If the Zero output of the ALU is active, write the PC with the contents
PCWriteCond = 1 of the register ALUOut.
jump address PCSource = 10, Write the PC with the jump address from the instruction.
PCWrite = 1
Seq AddrCtl = 11 Choose the next microinstruction sequentially.
Sequencing Fetch AddrCtl = 00 Go to the first microinstruction to begin a new instruction.
Dispatch 1 AddrCtl = 01 Dispatch using the ROM 1.
Dispatch 2 AddrCtl = 10 Dispatch using the ROM 2. 35
Maximally vs Minimally
Encoded
No encoding of control signals in microinstruction format (horizontal microprogram):
1 bit for each control signal in datapath operation; e.g. control signals s, t, u, v, w, x, y, z will
occupy 8 bits in microinstruction
faster, but requires more memory (logic)
used for Vax 780 — an astonishing 400K of control memory!
Lots of encoding of control signals in microinstruction format (vertical microprogram):

E.g. s, t, u, v, w, x, y, z will be encoded in say, 4 bits, with 0000 meaning u = 1 (others = 0),
1010 meaning u = w = 1 (others = 0), etc. I.e. all possible combinations are encoded
send the microinstructions through logic to get control signals
uses less memory, but slower
Select a good trade-off
Microcode implementation: on-chip vs off-chip

36
Exceptions
Exception: unexpected event from within the processor (e.g. arithmetic overflow)
Interrupt: “unexpected” event from outside of the processor (e.g. from an I/O device)
I/O device request External Interrupt
Invoke OS from user program Internal Exception
Arithmetic overflow Internal Exception
Using undefined instruction Internal Exception
Hardware malfunctions Either Exception or interrupt
An exception or an interrupt causes an unexpected change in control flow: How does the
control unit handle an exception/interrupt?
Î In case of an exception, processor should:
Î save address of the offending instruction in exception program counter (EPC)
Î indicate the reason for exception in Cause register (status register)
Î transfer control to operating system at some specified address (the OS can then provide some
service: taking predefined action in response to overflow or stopping the program & reporting an
error). If OS continues program execution, it uses EPC to determine where to restart
Î Another way is vectored interrupts:
Î the address to which control is transferred is determined by cause of the exception
37
Exceptions Handling
by Control Unit
Î Control unit:
Î two more control signals: EPCWrite & CauseWrite; also IntCause
Î modify the mux to PC to 4-way mux to allow exception address to PC (the
exception address is OS entry point for exception handling, and is 8000 0180hex for
MIPS)
Î To handle two types of exceptions: undefined instruction & arithmetic
overflow
Î add two states in state diagram to do the above: one when no state is defined for
the op value at state 1 (then → state 10), the other when overflow is detected
from ALU in state 7 (then → state 11)
38
Chapter Summary
Part 1:
Elements of datapath: instruction subset, resources, clocking method
Datapath for different instruction classes
Building single-cycle datapath: multiplexors, functional units, control signals
Single-cycle datapath control unit logic: ALU control, main control
Single-cycle datapath & control: complete picture, critical path, problems
Part 2:
Multi-cycle datapath: approach, additional registers & multiplexors, control signals
Breaking instructions into execution steps
Multi-cycle datapath & control: complete picture
Finite state machine (FSM) (hardwired) control & controller implementation
Microprogramming: control, microinstruction format, controller implementation,
symbolic microprogram & its control signals, issues
Exception Handling
39

5 Datapath Control 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5 Datapath Control 2

Uploaded by

Copyright:

Available Formats

PROCESSOR:

DATAPATH & CONTROL - 2

 Multi-Cycle Datapath & Control

 Reusing functional units (reduces hardware cost):

Note: we ignore jump inst here

Note: we ignore jump inst here 6

Data ready operation Clock in result

 Which control signals need to be asserted?

 What is the advantage of updating the PC now?

 In this step, we decode the instruction in IR (the opcode enters control

 Memory address computation (for lw, sw):

 Note: Conditional branch & jump instructions completed at this step!

MDR <= Memory[ALUOut];

 For ALU (R-type) instructions (write result to register):

Reg[IR[15:11]] <= ALUOut;

Î Control signals: RegDst = 1 (to select register address), MemtoReg = 0, RegWrite = 1

 Control signals: RegDst = 0 (to select register address), MemtoReg =

 Note: lw instruction completed at this step!

 What is going on during the 8th cycle of execution?

 Two different control techniques:

 Implementation can be derived from specification

 Consists of set of states & directions on how to change

CPI = 0.22 x 5 + 0.11 x 4 + 0.49 x 4 + 0.16 x 3 + 0.02 x 3

Example: address data

 ROM is 210 x 20 = 20K bits (and a rather unusual size)

 Rather wasteful, since lots of input combinations (addresses) will never

Field name Control signals

 State change at the edge of clock ALUSrcA

Dispatch ROM 2 Dispatch ROM 2 Dispatch ROM 1

State number Address-control action Value of AddrCtl

1. IR <= Memory[PC]; PC <= PC + 4; (State 0)

2. Instruction Decode (All instructions);

3. Memory address computation (for lw, sw):

4. For lw or sw instructions (access memory):

Microassembler: performs checks to remove combinations that cannot be supported in datapath

 Lots of encoding of control signals in microinstruction format (vertical microprogram):

 Select a good trade-off

 Microcode implementation: on-chip vs off-chip

You might also like

Multi-Cycle Datapath & Control

Reusing functional units (reduces hardware cost):

Which control signals need to be asserted?

What is the advantage of updating the PC now?

In this step, we decode the instruction in IR (the opcode enters control

Memory address computation (for lw, sw):

Note: Conditional branch & jump instructions completed at this step!

For ALU (R-type) instructions (write result to register):

Control signals: RegDst = 0 (to select register address), MemtoReg =

Note: lw instruction completed at this step!

What is going on during the 8th cycle of execution?

Two different control techniques:

Implementation can be derived from specification

Consists of set of states & directions on how to change

ROM is 210 x 20 = 20K bits (and a rather unusual size)

Rather wasteful, since lots of input combinations (addresses) will never

State change at the edge of clock ALUSrcA

Lots of encoding of control signals in microinstruction format (vertical microprogram):

Select a good trade-off

Microcode implementation: on-chip vs off-chip