Professional Documents
Culture Documents
Computer Arithmetic
Performance
Î Processor: Datapath
Î Processor: Control
Pipelining Techniques
Memory
Input/Output Devices
2
PROCESSOR:
DATAPATH & CONTROL
3
Multicycle Approach
Break up an instruction into steps, each step takes a cycle:
balance the amount of work to be done
restrict each cycle to use only one major functional unit
Different instructions take different number of cycles to complete
At the end of a cycle:
store values for use in later cycles (easiest thing to do)
introduce additional “internal” registers for such temporal
storage
4
Multi-Cycle Datapath:
Additional Registers
Additional “internal registers”:
Î Instruction register (IR) -- to hold current instruction
Î Memory data register (MDR) -- to hold data read from memory
Î A register (A) & B register (B) -- to hold register operand values from register files
Î ALUOut register (ALUOut) -- to hold output of ALU, also serves as memory address register
(MAR)
All registers except IR hold data only between a pair of adjacent cycles and thus do not need write
control signals; IR holds instructions till end of instruction, hence needs a write control signal
PC 0 0
M Instruction Read
Address [25– 21] register 1 M
u u
x Read x
Instruction Read A Zero
1 Memory [20– 16] register 2 data 1 1
Inst /Data 0 ALU ALU ALUOut
Registers
Instruction M Write Read result
[15– 0] Instruction u register data 2 B 0
Write Instruction [15– 11] x 4 1 M
data 1 Write u
register data 2 x
Instruction 0 3
[15– 0] M
u
x
Memory 1
data 16 32
Sign Shift
register
extend left 2
PC 0 0
M Instruction Read
Address [25– 21] register 1 M
u u
x Read x
Instruction Read A Zero
1 Memory [20– 16] register 2 data 1 1
Inst /Data 0 ALU ALU ALUOut
Registers
Instruction M Write Read result
[15– 0] Instruction u register data 2 B 0
Write Instruction [15– 11] x 4 1 M
data 1 Write u
register data 2 x
Instruction 0 3
[15– 0] M
u
x
Memory 1
data 16 32
register Sign Shift
extend left 2
Note the reason for each control signal; also note that we have included the jump instruction
7
Control Signals for
Multi-Cycle Datapath
Note:
Î three possible sources for value to be written into PC (controlled by
PCSource): (1) regular increment of PC, (2) conditional branch target from
ALUOut, (3) unconditional jump (lower 26 bits of instruction in IR shifted
left by 2 and concatenated with upper 4 bits of the incremented PC)
Î two PC write control signals: (1) PCWrite (for unconditional jump), & (2)
PCWriteCond (for “zero” signal to cause a PC write if asserted during beq
inst.)
Î since memory is used for both inst. & data, need IorD to select appropriate
addresses
Î IRWrite needed for IR so that instruction is written to IR (IRWrite = 1)
during the first cycle of the instruction and to ensure that IR not be
overwritten by another instruction during the later cycles of the current
instruction execution (by keeping IRWrite = 0)
Î other control signals
8
Breaking the Instruction
into 3 - 5 Execution Steps
1. Instruction Fetch (All instructions)
2. Instruction Decode (All instructions), Register Fetch & Branch Address
Computation (in advance, just in case)
3. ALU (R-type) execution, Memory Address Computation, or Branch
Completion (Instruction dependent)
4. Memory Access or R-type Instruction Completion (Instruction dependent)
5. Memory Read Completion (only for lw)
At end of every clock cycle, needed data must be stored into register(s) or memory
location(s).
Each step (can be several parallel operations) is 1 clock cycle --> Instructions take 3
to 5 cycles!
Events during a cycle, e.g.: Clock
IR <= Memory[PC];
PC <= PC + 4;
Why can instruction read & PC update be in the same step? Look at state element
timing
10
Step 2: Instruction Decode, Reg.
Fetch, & Branch Addr. Comp.
A <= Reg[IR[25:21]];
B <= Reg[IR[20:16]];
ALUOut <= PC + (sign-extend(IR[15:0]) << 2);
Control signals:
Î ALUSrcA = 0, ALUSrcB = 11, ALUOp = 00 (add)
Î Note: no explicit control signals needed to write A, B, & ALUOut. They are
written by clock transitions automatically at end of step
11
Step 3: Instruction
Dependent Operation
One of four functions, based on instruction type:
Î Control signals (for lw): IorD = 1 (to select ALUOut as address), MemRead = 1, note
that no write signal needed for writing to MDR, it is written by clock transition automatically at end
of step
Î Control signals (for sw): IorD = 1 (to select ALUOut as address), MemWrite = 1
+ The write actually takes place at the end of the cycle on the clock edge!
Note: sw and ALU (R-type) instructions completed at this step!
13
Step 5: Memory Read
Completion
For lw instruction only (write data from MDR to register):
Reg[IR[20:16]]<= MDR;
14
Summary of Execution Steps
Action for R-type Action for memory-reference Action for Action for
Step name instructions instructions branches jumps
Instruction fetch IR <= Memory[PC]
PC <= PC + 4
Instruction A <= Reg [IR[25:21]]
decode/register fetch B <= Reg [IR[20:16]]
/branch addr comp ALUOut <= PC + (sign-extend (IR[15:0]) << 2)
Execution, address ALUOut <= A op B ALUOut <= A + sign-extend if (A ==B) then PC <= PC [31:28]
computation, branch/ (IR[15:0]) PC <= ALUOut II (IR[25:0]<<2)
jump completion
Memory access or R-type Reg [IR[15:11]] <= Load: MDR <= Memory[ALUOut]
completion ALUOut or
Store: Memory [ALUOut] <= B
Memory read completion Load: Reg[IR[20:16]] <= MDR
Some instructions take shorter number of cycles, therefore next instructions can start earlier.
Hence, compare to single-cycle implementation where all instructions take same amount of time, multi-cycle
implementation is faster!
Multi-cycle implementation also reduces hardware cost (reduces adders & memory, increases number of
registers & muxes).
15
Simple Questions
How many cycles will it take to execute this code?
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume not
add $t5, $t2, $t3
sw $t5, 8($t3)
Label: ...
16
Defining the Control for
Multi-Cycle Datapath
Multi-cycle vs single-cycle datapath:
for single-cycle, truth-tables to specify setting of control signals based on
instruction
for multi-cycle, control is more complex due to instruction is executed in steps;
control must specify both the control signals in any step & the next step in the
sequence
Value of control signals dependent upon:
what instruction is being executed
which step is being performed
Î Microprogramming
17
Finite State Machine
(FSM) Control
18
The Complete FSM Control
Instruction decode/
Instruction fetch register fetch
0
MemRead 1
Graphical specification: ALUSrcA = 0
IorD = 0 ALUSrcA = 0
Start IRWrite ALUSrcB = 11
ALUSrcB = 01 ALUOp = 00
ALUOp = 00
PCWrite
PCSource = 00
e)
')
-t yp
EQ
(Op = 'J')
=R
'B
(O p
=
Memory address W ')
p
p = 'S Branch
(O
computation O Jump
r ( Execution completion
') o completion
= 'LW
2 (Op 6 8 9
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA =1 ALUSrcB = 00
ALUSrcB = 10 ALUOp = 01 PCWrite
ALUSrcB = 00 PCSource = 10
ALUOp = 00 ALUOp = 10 PCWriteCond
PCSource = 01
(O
(Op = 'LW')
p
=
'S
W
')
Memory Memory
access access R-type completion
3 5 7
RegDst = 1
MemRead MemWrite RegWrite
IorD = 1 IorD = 1 MemtoReg = 0
Write-back step
4
RegDst = 0
RegWrite
MemtoReg = 1
19
CPI in Multi-Cycle CPU
Example:
load store R-type branch jumps
(cond.)
gcc instruction 22% 11% 49% 16% 2%
mix
#cycles 5 4 4 3 3
20
FSM Controller
Implementation
Typically by a block of combinational logic & a state register to hold the current state
PCWrite
Total of 9 states --> 4 bit state register PCWriteCond
Combinational control logic: IorD
MemRead
Inputs: current state & any input used to MemWrite
determine the next state (in this case is 6-bit IRWrite
Combinational MemtoReg
opcode) control logic
PCSource
Outputs: next state number & control ALUOp
signals to be asserted for current state Outputs ALUSrcB
Note: here outputs depend only on current ALUSrcA
RegWrite
state, not on inputs (Moore machine)
RegDst
NS3
NS2
NS1
Inputs NS0
Op1
Op0
Op5
Op4
Op3
Op2
S3
S2
S1
S0
Instruction register State register
opcode field
21
PLA Implementation of the
Combinational Control Logic
Op5
If I picked a horizontal or a
Op4
vertical line, could you explain it?
Op3
Note: upper half is AND plane &
Op2
lower half is OR plane
Op1
Example: PCWrite = 1 if (current state is Op0
state 0) or (current state is state 9), i.e.,
S3
S2
PCWrite = S 3 ⋅ S 2 ⋅ S1 ⋅ S 0 + S 3 ⋅ S 2 ⋅ S1 ⋅ S 0 S1
S0
Example: next state bit 2 NS2 = 1 (i.e. states
4, 5, 6, or 7) if (current state is 3) or (current PCWrite
PCWriteCond
state is 2 and op = 101011 (sw)) or (current IorD
state is 1 and op = 000000 (R-type)) or MemRead
MemWrite
(current state is 6), I.e. IRWrite
MemtoReg
PCSource1
PCSource0
ALUOp1
NS 2 = S 3 ⋅ S 2 ⋅ S1⋅ S 0 + ALUOp0
ALUSrcB1
S 3 ⋅ S 2 ⋅ S1⋅ S 0 ⋅ Op5 ⋅ Op 4 ⋅ Op3 ⋅ Op 2 ⋅ Op1 ⋅ Op0 + ALUSrcB0
ALUSrcA
RegWrite
S 3 ⋅ S 2 ⋅ S1⋅ S 0 ⋅ Op5 ⋅ Op 4 ⋅ Op3 ⋅ Op 2 ⋅ Op1⋅ Op0 + RegDst
NS3
S 3 ⋅ S 2 ⋅ S1 ⋅ S 0 NS2
NS1
NS0 22
ROM Implementation of
Combinational Control Logic
Combinational control logic can be express in a truth table: inputs are current
state values (S3 - S0) & Opcodes (Op5 - Op0); outputs are control signals &
next state values (NS3 - NS0)
A ROM can be used to implement a truth table
if the address (inputs) is m-bits, we can address 2m entries in the ROM
outputs are the bits of data that the address points to
23
ROM Implementation of
Combinational Control Logic
How many inputs are there?
6 bits for opcode, 4 bits for current-state = 10 address lines
(i.e., 210 = 1024 different addresses)
How many outputs are there?
16 datapath-control outputs, 4 next-state bits = 20 bit outputs
24
ROM vs. PLA
Break up the table into two parts
— 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM
— 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM + small circuit
— Total: 4.3K bits of ROM + small circuit
PLA is much smaller
— can share product terms
— only need entries that produce an active output
— can take into account don't cares
Size is (#inputs × #product-terms) + (#outputs × #product-terms)
For this example, PLA size prop. to = (10x17)+(20x17) = 510 PLA cells
PLA cells usually about (slightly bigger) the size of a ROM cell (bit)
PLA is a much more efficient implementation for this control unit
25
Microprogramming Control
If the assembly language instruction set becomes very large, FSM could require
hundreds to thousands of states & many arcs (sequences) -- very complex
Î Complex control better managed by microprogramming
Basic idea:
All control signals in a cycle form a microinstruction, each microinst. defines:
the set of datapath control signals that must be asserted in a given state (cycle)
next microinstruction
Executing a microinstruction = asserting the control signals specified
A sequence of microinstructions form a microprogram
Each cycle, a microinstruction is fetched from the microprogram & executed
Microprogramming -- designing the control as a program implementing machine
instructions by simpler microinstructions
Each control state corresponds to a microinstruction
Our basic FSM: 10 states → 10 micro-instructions
26
Microinstruction Format
A microinstruction contains several fields + 1 label
Each field specifies a non-overlapping set of control signals
Signals that are never asserted simultaneously may share the same field
A last field specifies how to choose the next microinstruction
Label: some micro-instructions have a label to be branched at
In our example, we have 7 fields + 1 label
1st to 6th fields: control specification; 7th field: next instruction
27
A Microprogram
Control Unit
Microinstructions are placed in a
PCWrite
ROM or PLA Control unit
PCWriteCond
IorD
The state (in state register) enters MemRead
PLA or ROM
as input or address to define the MemWrite
IRWrite
current microinstruction, which in BWrite
MemtoReg
turn asserting relevant control Outputs
PCSource 2
signals ALUOp
ALUSrcB
2
2
Op[5–0]
0) (Fetch)
Î choose next microinstruction based
on opcode (AddrCtl selects dispatch Instruction register
table) (Dispatch) opcode field
28
A Review of Our
State Diagram
Instruction decode/
Instruction fetch register fetch
0
Graphical specification:
MemRead 1
ALUSrcA = 0
IorD = 0 ALUSrcA = 0
Start IRWrite ALUSrcB = 11
ALUSrcB = 01 ALUOp = 00
ALUOp = 00
PCWrite
PCSource = 00
e)
)
-t yp
Q'
(Op = 'J')
E
=R
'B
(Op
=
Memory address W ')
p
p = 'S Branch
(O
computation O Jump
(
') or Execution completion completion
= 'LW
2 (Op 6 8 9
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA =1 ALUSrcB = 00
ALUSrcB = 10 ALUOp = 01 PCWrite
ALUSrcB = 00 PCSource = 10
ALUOp = 00 ALUOp = 10 PCWriteCond
PCSource = 01
(O
(Op = 'LW')
p
=
'S
W
')
Memory Memory
access access R-type completion
3 5 7
RegDst = 1
MemRead MemWrite RegWrite
IorD = 1 IorD = 1 MemtoReg = 0
Write-back step
4
RegDst = 0
RegWrite
MemtoReg = 1
29
Sequencing:
Address Select Logic
PLA or ROM
Dispatch ROM 1 1
Op Opcode name Value
State
000000 R-format 0110
000010 jmp 1001 Adder
000100 beq 1000 Mux AddrCtl
100011 lw 0010 3 2 1 0
101011 sw 0010
0
Op
101011 sw 0101
Instruction register
opcode field
Microprogram counter
Microcode Adder
storage Address select logic
Op[5–0]
Sequencer
Instruction register
opcode field
31
A Review of
Datapath & Control
2
2
32
Note the reason for each control signal; also note that we have included the jump instruction
A Review of the Instruction
Execution Steps
5. For lw instruction only (write data from MDR to register): Reg[IR[20:16]]<= MDR; (State 4)
33
A Symbolic Microprogram
A specification methodology
appropriate if hundreds of opcodes, modes, cycles, etc.
signals specified symbolically using microinstructions
E.g. Read PC = Read memory using PC as address and write result into IR (& MDR) (see
next slide for details)
Our symbolic microprogram with 10 microinstructions:
ALU Register PCWrite
Label control SRC1 SRC2 control Memory control Sequencing
Fetch Add PC 4 Read PC ALU Seq
Add PC Extshft Read Dispatch 1
Mem1 Add A Extend Dispatch 2
LW2 Read ALU Seq
Write MDR Fetch
SW2 Write ALU Fetch
Rformat1 Func code A B Seq
Write ALU Fetch
BEQ1 Subt A B ALUOut-cond Fetch
JUMP1 Jump address Fetch
34
Control Signals for Each Symbol
in Each Field in the Microprogram
Field name Value Signals active Comment
Add ALUOp = 00 Cause the ALU to add.
ALU control Subt ALUOp = 01 Cause the ALU to subtract; this implements the compare for
branches.
Func code ALUOp = 10 Use the instruction's function code to determine ALU control.
SRC1 PC ALUSrcA =0 Use the PC as the first ALU input.
A ALUSrcA =1 Register A is the first ALU input.
B ALUSrcB = 00 Register B is the second ALU input.
SRC2 4 ALUSrcB = 01 Use 4 as the second ALU input.
Extend ALUSrcB = 10 Use output of the sign extension unit as the second ALU input.
Extshft ALUSrcB = 11 Use the output of the shift-by-two unit as the second ALU input.
Read Read two registers using the rs and rt fields of the IR as the register
numbers and putting the data into registers A and B.
Write ALU RegWrite = 1, Write a register using the rd field of the IR as the register number and
Register RegDst = 1, the contents of the ALUOut as the data.
control MemtoReg = 0
Write MDR RegWrite = 1, Write a register using the rt field of the IR as the register number and
RegDst = 0, the contents of the MDR as the data.
MemtoReg = 1
Read PC MemRead = 1, Read memory using the PC as address; write result into IR (and
IorD = 0, IRWrite=1 the MDR).
Memory Read ALU MemRead = 1, Read memory using the ALUOut as address; write result into MDR.
lorD = 1
Write ALU MemWrite = 1, Write memory using the ALUOut as address, contents of B as the
lorD = 1 data.
ALU PCSource = 00 Write the output of the ALU into the PC.
PCWrite = 1
PC write control ALUOut-cond PCSource = 01, If the Zero output of the ALU is active, write the PC with the contents
PCWriteCond = 1 of the register ALUOut.
jump address PCSource = 10, Write the PC with the jump address from the instruction.
PCWrite = 1
Seq AddrCtl = 11 Choose the next microinstruction sequentially.
Sequencing Fetch AddrCtl = 00 Go to the first microinstruction to begin a new instruction.
Dispatch 1 AddrCtl = 01 Dispatch using the ROM 1.
Dispatch 2 AddrCtl = 10 Dispatch using the ROM 2. 35
Maximally vs Minimally
Encoded
No encoding of control signals in microinstruction format (horizontal microprogram):
1 bit for each control signal in datapath operation; e.g. control signals s, t, u, v, w, x, y, z will
occupy 8 bits in microinstruction
faster, but requires more memory (logic)
used for Vax 780 — an astonishing 400K of control memory!
An exception or an interrupt causes an unexpected change in control flow: How does the
control unit handle an exception/interrupt?
Î In case of an exception, processor should:
Î save address of the offending instruction in exception program counter (EPC)
Î indicate the reason for exception in Cause register (status register)
Î transfer control to operating system at some specified address (the OS can then provide some
service: taking predefined action in response to overflow or stopping the program & reporting an
error). If OS continues program execution, it uses EPC to determine where to restart
Î Another way is vectored interrupts:
Î the address to which control is transferred is determined by cause of the exception
37
Exceptions Handling
by Control Unit
Î Control unit:
Î two more control signals: EPCWrite & CauseWrite; also IntCause
Î modify the mux to PC to 4-way mux to allow exception address to PC (the
exception address is OS entry point for exception handling, and is 8000 0180hex for
MIPS)
Î To handle two types of exceptions: undefined instruction & arithmetic
overflow
Î add two states in state diagram to do the above: one when no state is defined for
the op value at state 1 (then → state 10), the other when overflow is detected
from ALU in state 7 (then → state 11)
38
Chapter Summary
Part 1:
Elements of datapath: instruction subset, resources, clocking method
Datapath for different instruction classes
Building single-cycle datapath: multiplexors, functional units, control signals
Single-cycle datapath control unit logic: ALU control, main control
Single-cycle datapath & control: complete picture, critical path, problems
Part 2:
Multi-cycle datapath: approach, additional registers & multiplexors, control signals
Breaking instructions into execution steps
Multi-cycle datapath & control: complete picture
Finite state machine (FSM) (hardwired) control & controller implementation
Microprogramming: control, microinstruction format, controller implementation,
symbolic microprogram & its control signals, issues
Exception Handling
39