Appendix A

Appendix A
Pipelining: Basic and Intermediate Concepts
Overview
Introduction
Pipeline concepts Basics of RISC instruction set Classic 5-stage pipeline
Pipeline Hazards
Stalls, structural hazards, data hazards Branch hazards
Pipeline Implementation
Simple MIPS pipeline
Implementation Difficulties for Pipelines

Exceptions, instruction set complications
Extending MIPS Pipeline to Multicycle operations Example: MIPS R4000 Pipeline
Pipelining
Similar to an assembly line
Widget Definition
Add partA, then B, then C, then D
Widget Definition
Add partA
Add partB
Add partC
Add partD
CPU Pipelining
multiple clock cycles, or one LONG clock cycle
Instruction 2
Fetch, then decode, then execute then access memory if needed then write results if needed
Instruction 1
Instruction 6
Instruction 5
Fetch
Instruction 4
Instruction 3
Instruction 2
Decode
Execute
Memory
Write Results
Instruction 1
1 cycle
1 cycle
1 cycle
1 cycle
1 cycle
Performance and Pipelining

For the following assumptions:
N stages in the pipeline Unpipelined execution time for 1 instruction is T Pipeline stages are equal and perfectly balanced
Then
Execution time for pipelined version = T N Throughput increase is N
Advantages of Pipelining
Significant speedup without much additional hardware.
Invisible to the programmer
RISC (MIPS) Pipeline

All ALU operations operate on registers Only load and store affect memory Load and store of 8,16,32-bit items available Few instruction formats All instructions the same size
Non-pipelined Implementation
Multi-cycle implementation Simplified to better understand transition to pipelined version Not the most efficient implementation
Datapath
Control
Simplified Datapath
Program Counter Instruction Register Branch Target
Memory (instructions and data)
Register File
ALU
Multiplexors not shown Control signals not shown Sign extend and shift modules not shown
Simplified Control State Diagram

IFetch
lw/sw
AddrCal
IDecode Rtype Rexec
Immed
branch
ImmExec
Brcomplete
LWmem
SWmem
Rfinish
ImmFinish
LWwrite
Multi-cycle Implementation
At most 5 cycles to implement an instruction
Branch 3 cycles Load 5 cycles Others 4 cycles
Assume the following instruction frequencies:

Branch 12% Load 10% Others 78%
cycles CPI = (.12*3)+(.1*5)+(.78*4) = 3.98 instruction
Pipelined Version
Each of the 5 clock cycles becomes a pipe stage
IF, ID, EX, MEM, WB
Use separate data and instruction memories

implemented with two caches eliminates conflicts between instruction fetch and memory access
Stages
IF use PC to address current instruction from memory; update PC ID decode instruction and read registers from register file; do equality test on register; sign extend offset field; compute possible branch target EX ALU operates on operands (memory address calculation, register-register operation, registerimmediate operation MEM if a load, read memory, if a store, write memory WB for register-register or load, write register result back to register file.
Simplified Pipelined Datapath

Instruction Memory Data Memory
Pipeline Registers
Register file (just one) Read register after fetch Write register after data memory access
Pipeline Execution
Pipeline Registers
Stages
IF ID EX Mem WB
IF/ID
ID/EX
EX/Mem
Mem/WB
Pipeline register names
Some Issues
Register file used in two stages,
two register reads (two operands) and one register write during a single clock cycle
PC needed in IF stage and must be updated on every clock cycle Adder needed in ID to compute branch target in cases of branch/jump instructions Branch does not change PC until ID stage, next instruction already fetched at that point
Instruction Timing
Throughput is increased approximately by 5 Execution time of individual instruction INCREASES due to pipelining overhead
Pipeline register delay Clock skew (T = TCL + Tsu + Treg + Tskew )
Important to balance pipeline stages, since clock is matched to slowest stage (TCL)
Example
Unpipelined: 1GHz clock (T = 1ns) ALU 4 cycles 40% Branches 4 cycles 20% Memory 5 cycles 40% If pipelined, increase T by: Tskew + TSU + Treg = .2ns How much speedup from a 5-stage pipeline? Unpipelined execution time: E. Timeu = T * CPI CPI = (.4*4) + (.2*4) + (.4*5) = 4.4
E. Time = 1ns * 4.4 = 4.4ns

Pipelined execution time: E. Timep = T T = 1.2ns
Speedup =
E. Timeu
E. Timep
= 4.4/1.2 = 3.7
Pipeline Hazards
Structural Hazards resource conflicts when more than one instruction needs a resource Data Hazards an instruction depends on a result from a previous instruction that is not yet available Control Hazards conflicts from branches and jumps that change the PC
Pipeline Stall
stall
One solution to some hazards
Performance with Stalls

CPI Pipelined = Ideal CPI + stall cycles per instruction = 1 + stall cycles per instruction Pipelining Speedup (ideal) = CPI unpipelined 1 + stall cycles per instruction
Simple case: CPI unpipelined = # of pipeline stages

# of pipeline stages Pipelining Speedup (ideal) = 1 + stall cycles per instruction For no stalls, pipeline speedup = # stages Tunpipelined Pipelining Speedup (actual) = x Tpipelined 1 + stall cycles per instruction
# of pipeline stages
Structural Hazards (Resource Conflicts)
If Data Memory and Instruction Memory are implemented with a single memory, then this can cause a structural hazard
Data Hazards (Instruction Dependencies)
DADD R1, R2, R3
Hazards
DSUB R4, R1, R5
AND
R6, R1, R7
OR
R8, R1, R9
No hazards
XOR R10, R1, R11
Forwarding
Solution for hazards Also called bypassing or short-circuiting
Create potential datapath from where result is calculated to where it is needed by another instruction Detect hazard to route the result
Example:
DADD R1, R2, R3
Forwarding path
DSUB R4, R1, R5
DADD R1, R2, R3
DSUB R4, R1, R5
AND
R6, R1, R7
OR
R8, R1, R9
XOR
R10, R1, R11 read register file write register file
More Forwarding
Remaining Stalls
Some data hazards cannot be resolved by forwarding:
LD DSUB AND OR R1,0(R2) R4, R1, R5 R6, R1, R7 R8, R1, R9
Hazard detection causes stall until hazard is cleared. Hardware interlock
Stall to Solve Data Hazard

LD DSUB R1, 0(R2) R4, R1, R5 IF ID IF EX ID MEM stall WB EX MEM WB
AND
OR
R6, R1, R7
R8, R1, R9
IF
stall
stall
ID
IF
EX
ID
MEM
EX
WB
MEM WB
Branch Hazards
Branch not taken
BEQZ R1, Name Instr. 1 Instr. 2 Instr. 3 Branch taken Instr. 4
Name:
Pipeline hazard solution using a fetch redo after branch:

BEQZ R1, Name IF ID IF Instr. 1 EX MEM IF ID Instr. 1 or Instr. 4 WB EX
MEM
WB
Branch redo penalties

If branch is not taken, second fetch is redundant Always stalling after branch results in 10% to 30% performance loss
Reducing Branch Penalty

Compile time solutions
Decide on a hardware action Compiler tries to use this knowledge
1. 2. 3. 4.
Pipeline freeze (or flush) Predicted-not-taken Predicted-taken Delayed branch
Pipeline Freeze
Hold or delete all instructions after a branch until the target address is known.
Simple to implement Results in 1 cycle stall for MIPS Longer stalls for other pipeline architectures
Predicted-not-taken
Execute successor instructions in sequence Squash instructions in pipeline if branch actually taken Must be careful not to alter state of registers until actual branch target is known Only slightly more complicated than pipeline freeze to implement Compiler can modify loops to favor branches not taken
Predicted-taken
Treat every branch as taken As soon as branch is decoded and target address is computed, begin fetching at the target
No advantage for MIPS because target address is not known any earlier than branch outcome Only makes sense for machines that compute target address before determining branch outcome
Delayed Branch
Execute instruction after branch no matter what Fetch subsequent instruction depending on branch outcome
branch sequential successor instruction branch target if taken
Compilers job is to put a useful instruction as the sequential successor instruction Otherwise a NOP is used
Performance with Branches
Pipeline depth Pipeline speedup = 1 + Branch frequency x Branch penalty
For the schemes just mentioned, penalty is at most 1 cycle. Penalty is more for deeper pipelines
Details of pipeline implementation
So that other issues can be explored
Look at non-pipelined implementation first

Focus on integer subset of MIPS
Load-store word Branch equal zero Integer ALU operations
Basic principles can be extended to all instructions
1.

Instruction Fetch cycle (IF)

IR Mem[PC] NPC PC+4
2.

Instruction decode/register fetch cycle (ID)

Decode opcode A Regs[rs] B Regs[rt] Imm sign-extend immediate field of IR
3.

Execution/effective address cycle (EX)

ALUOutput A + Imm (memory reference) OR ALUOutput A func B (register-register ALU op) OR ALUOutput A op Imm (register-immediate ALU op) OR ALUOutput NPC + (Imm<<2); Cond (A==0) (Branch)
4. Memory access/branch completion cycle (MEM)
LMD Mem[ALUOutput] (load) OR Mem[ALUOutput] B (store)
5. Write-back cycle (WB)

Regs[rd] ALUOutput (register-register ALU) OR Regs[rt] ALUOutput (register-immediate) OR Regs[rt] LMD (load)
Multicycle Datapath
rs = R2 Regs[R2]+55 rd = R1 Address for LD instruction Regs[R2] Mem(Regs[R2]+55)
CYCLE 1
CYCLE 3
Imm = 55
CYCLE 4
CYCLE 2 CYCLE 5
Example: Datapath for LD R1, 55(R2)
Add Pipeline Registers
PC can also be considered a pipeline register
Pipeline registers take place of IR, A, B, Imm, ALUoutput, LMD
Stage by Stage Operation

Figure A.19 pp A-32
Each pipeline register has fields Example: IF/ID.IR IR field of the IF/ID pipeline register
Pipeline Control
Control signals needed for MUXs Register (write) ALU (function) Data memory (read/write)
Overview
Introduction
Pipeline Hazards

Control Complications
Instruction issue when instruction transfers from ID stage into EX stage Data hazards checked in ID stage
If stall is required, instruction is stalled before it is issued If forwarding is needed, controls are set
Hazard Detection Situations

LD R1, 45(R2) DADD R5, R1, R7 (Requires stall)
Comparators detect the use of R1 in the DADD and stall the DADD (and future instructions) before the DADD begins EX.
LD R1, 45(R2) DADD R5, R6, R7 DSUB R8, R1, R7 (Requires forwarding)
Comparators detect the use of R1 in DSUB and forward result of load to ALU in time for DSUB to begin EX.
Load Interlocks
Recall that the following code requires a stall or load interlock to prevent Read After Write (RAW) hazards
LD DSUB AND OR R1,0(R2) R4, R1, R5 R6, R1, R7 R8, R1, R9
Hazard can be detected in the ID stage by comparing rt and rs registers
Load Interlock Detection Logic

Opcode field of ID/EX (ID/EX0..5) Opcode field of IF/ID (IF/ID.IR0..5)
Matching Operand Fields
Load Load
Register-register ID/EX.IR[rt]==IF/ID.IR[rs] ALU Register-register ID/EX.IR[rt]==IF/ID.IR[rt] ALU

ID/EX.IR[rt]==IF/ID.IR[rs] Load, store, ALU immediate, or branch
Load
Implementing a Stall after Detection

Change opcode in ID/EX pipeline register to 00000 (NOP) Recirculate contents of IF/ID register to hold stalled instruction
Forwarding Logic
Detection is similar to detecting RAW, but more cases All forwarding values originate at ALU or data memory output Terminate at ALU input, data memory input or zero detection unit
Forwarding Logic
Additional MUX inputs and paths
Branches in Pipeline
Consider only BEQZ and BNEZ (branch if equal to zero or not equal to zero) For these it is possible to move the test to the ID stage To take advantage of early decision, target address must also be computed early Must add another adder for computing target address in ID Result is 1-cycle stall on branches. Branches on result of register from previous ALU operation will result in a data hazard stall.
Logic for Early Test and Target Address

Additional adder
Zero test
Compare to Previous Version
Exceptions and Pipelines

Exceptions can come from several sources and can be classified several ways Sources
I/O Device Interrupt Invoking OS from user program Tracing program execution Breakpoint Integer arithmetic overflow or underflow, FP trap Page fault Misaligned memory accesses Memory protection violation Undefined instruction Hardware malfunction Power failure
Exceptions and Pipelines

Exception characteristics
Synchronous (from within cpu) vs asynchronous
Asynchronous caused by devices external to cpu and memory
User requested vs coerced

User requests from a program are predictable Coerced requests are from some hardware event outside control of user program
User maskable vs user non-maskable

Masks control whether hardware responds to exception or not
Within vs between instructions

Within are usually synchronous instruction triggers exception Asynchronous within are catastrophic and cause program termination
Resume vs terminate
Terminating programs execution always stops after interrupt Resuming program execution continues after interrupt is handled. Resuming exceptions harder to handle.
Classifications of Exception Types

Figure A.27 in text Most difficult:
Synchronous Coerced Within instructions that can be resumed
Restartable pipelines or processors can handle these
Example: Virtual Memory Page Fault

Occur here: Synchronous Coerced Within instruction Resume
Saving Pipeline State

Force a trap instruction into the pipeline on the next IF Until trap is taken, turn off all writes for faulting instruction and all that follow After the trap receives control, immediately save PC of faulting instruction to return from trap later.
For delayed branch pipelines, need to save and restore as many PCs as the length of the branch delay plus one. (Compilers put instructions out of order)
After exception is handled, return from exception by reloading PCs and restart instruction stream.
Precise exceptions if pipeline can always be stopped to that the instruction just before the faulting instruction are completed and those after it can be restarted. Floating point instructions tend to take many cycles,
difficult to have precise exceptions
Some CPUs have 2 modes of operation

Precise exception mode allows less overlap in floating point instruction - slower Fast performance mode
Almost all integer pipelines support precise exceptions
Precise Exceptions in MIPS

Possible exceptions in MIPS stages
IF page fault on instruction fetch; misaligned memory access; memory protection violation ID undefined or illegal opcode EX Arithmetic exception MEM page fault on data fetch; misaligned memory access; memory protection violation WB None
Multiple exceptions can occur on the same clock cycle
Example
LD IF DADD ID IF EX ID MEM EX WB MEM
WB
Data page fault
Arithmetic exception
1. Deal with the page fault, redo the DADD 2. Deal with the DADD arithmetic exception that will occur again
But: Exceptions can occur out of order Alternate solution: Hardware posts all exceptions in a status vector Control signals that writes data is turned off When instruction enters WB, exception status vector is checked Exceptions of earliest instructions handled first.
Helpful MIPS Pipeline Features

No instruction updates the state of the processor (registers or memory) before the MEM stag
ISA Complications for Pipelining

Instructions that change processor state at multiple stages in pipeline.
Ex: Autoincrement, autodecrement Must provide a way to back out of instruction even after it is partially completed Usually requires the storage of extra state
The use of condition codes

Restricts the reordering of instructions that is often useful for delay slots after branches Complications of deciding when condition codes are fixed affects exception handling as well as hazard detection
ISA Complications for Pipelining

Multi-cycle operations (operations that take a variable number of cycles based on operands)
Example: Move Character String where instruction specifies address and length of string
Some ISAs are just too complex to be pipelined efficiently
MIPS Multicycle Operations

Floating Point operations
Load/Store Add Multiply These are multicycle for integers too Divide
Multiple Cycles in Execution Stage
IF
ID
EX
MEM
WB
One Approach
4 separate Function Units for EX Stage Integer takes 1 clock cycle FP units take multiple cycles Instruction issue: Allowing an instruction to move from ID to EX phase
Could Pipeline the Function Units

Allows some overlap of instructions Difficult to pipeline divider Pipelined Units
Divide unit takes 24 clock cycles, but is NOT pipelined.
Definitions
Latency: the number of cycles between when an instruction produces a result and when the next instruction can use the result. Integer ALU: latency = 0 Loads: latency = 1
FP Mult: latency = 6
FP Add: latency = 3 Results consumed at beginning of EX stage FP Div: latency = 24

(generally 1 cycle less than stages in function unit pipeline)
Definitions
Initiation Interval: the number of cycles that must elapse between issuing two operations of a given type Integer ALU, Loads, FP Add, FP Mult: Initiation Interval = 1
Divide: Initiation Interval = 25
Example Pipeline Timing

MUL.D ADD.D
L.D
IF
ID IF
M1 M2 ID
IF
M3 A2
EX
M4 A3
MEM
M5 A4
WB
M6 ME M
M7 WB
MEM WB
A1
ID
S.D
IF
ID
EX
MEM WB
Stages where data is needed
Stages where results are available
Hazards and Forwarding

Multi-cycle, non-pipelined divide unit can cause structural hazards must be detected. Varying run times means that there can be multiple register writes in a clock cycle. Instructions dont reach WB in order, so Write After Write (WAW) hazards are possible. Exceptions complicated by out of order completion of instructions. Longer latency results in more frequent stalls for Read After Write (RAW) hazards
Multiple Register Writes

Clock Cycle Number
Instruction
MUL.D ... ... ADD.D ... ...
1
IF
2
ID IF
3
M1 ID IF
4
M2 EX ID IF
5
M3 MEM EX ID IF
6
M4 WB MEM A1 ID IF
7
M5
8
M6
9
M7
10
MEM
11
WB
WB A2 EX ID A3 MEM EX A4 WB MEM WB MEM WB
L.D
IF
ID
EX
MEM
WB
Possible Solutions
Add write ports probably not a good idea because it is not a common scenario. Detect structural hazard and implement interlock.
Track scheduled write ports in ID and stall there, OR Stall conflicting instruction in the MEM or WB stage
Pros and Cons of each approach stalls in ID phase will be assumed
WAW Hazards
Clock Cycle Number
Instruction
MUL.D ... ... ADD.D F2, F4, F6 ... L.D F2, 0(R2) ...
1
IF
2
ID IF
3
M1 ID IF
4
M2 EX ID IF
5
M3 MEM EX ID IF
6
M4 WB MEM A1 ID IF
7
M5
8
M6
9
M7
10
MEM
11
WB
WB A2 EX ID IF A3 MEM EX ID A4 WB MEM EX WB MEM WB MEM WB
Overview
Introduction
Pipeline Hazards

Review of Hazards
Caused by different lengths of execution unit pipelines.
Structural hazards multiple instructions need the same function unit at the same time RAW data hazards Instruction needs to read a value that has not been written yet WAW data hazards Writes occur out of order
Handling Hazards
Structural Hazards
Wait to issue instructions if divider is busy, or if the register write port will not be available.
RAW Hazards
Check source registers against pending destinations, stall issue if necessary.
WAW Hazards
Determine if any instruction in MULT, ADD, or DIV pipeline has same destination of instruction being issued, stall issue if necessary.
Precise Exceptions
Out of order completion makes precise exceptions difficult
Completion Time (starting at 0,1,2)
DIV.D F0, F3, F5 ADD.D F9, F9, F7 SUB.D F10, F10, F14 cycle 28 cycle 9 cycle 10
No data hazards, so no stalls IF SUB causes an exception, ADD is already done, but DIV is NOT complete. Saving PC and starting over at SUB.D will not work.
Solution Options
Buffer results until all previous instructions are complete.
OK as long as the difference in completion times is reasonable. (Lots of storage otherwise)
Allow exceptions to be imprecise, have trap routines create precise sequence.

Requires some buffering also. Trap routine finishes instructions preceding the latest instruction completed before returning.
MIPS FP Pipeline
Stalls to avoid structural and RAW hazards
Stalls per FP operation # stalls depends on latency # stalls also depends on how many cycles before results are used Divide frequency is low, but number of stalls needed is high due to latency Average for add/sub/conv = 1.7 (56% of latency) Average for mult = 2.8 (46%) Average for div = 14.2 (59%)
MIPS R4000 Pipeline

Implements MIPS-64 Deeper pipeline (8 stages) Superpipeline Higher clock rate (smaller logic in each stage) Additional stages from decomposing memory accesses
MIPS R4000 8-Stage Pipeline
Stages: IF First half of instruction fetch IS Second half of instruction fetch RF Instruction decode and register fetch, hazard checking, instruction cache hit detection EX execution (address calc., ALU operation, condition evaluation DF Data fetch, first half of data cache access DS Second half of data fetch, completion of cache access TC Tag check, determine whether the data cache access hit WB Write back for loads and register-register operations
Effects of Deeper Pipeline

More forwarding required Load and branch delays increased
2-cycle stall for RAW hazards with load Branch delay is 3 cycles
Single-cycle branch delay used along with forwarding Predicted-not-taken approach used
MIPS R4000 Floating-point Pipeline

3 functional units
FP adder FP multiplier FP divider
8 stages, used 0 or many times, in different orders, by different instructions Large range of completion times (2-112 cycles)
FP Pipeline Stages
Stage A D E M N R S U Functional Unit FP Adder FP divider FP multiplier FP multiplier FP multiplier FP adder FP adder Description Mantissa ADD stage Divide pipeline stage Exception test stage First stage of multiplier Second stage of multiplier Rounding stage Operand shift stage Unpack FP numbers
Instruction Latencies and Initiation Intervals

FP Instruction Add, subtract Multiply Divide Square root Negate Absolute Value FP Compare Latency 4 8 36 112 2 2 3 Initiation Interval 3 4 35 111 1 1 2
MIPS R4000 Pipeline Performance

Four major causes of pipeline stalls
Load stalls Branch stalls FP results stalls (RAW hazards) FP structural stalls
CPI for 10 SPEC92 benchmarks
Branch stalls from longer pipeline substantial FP structural stalls sometimes masked by result stalls
AppendixA Summary
For ideal N-stage pipeline, throughput increase is N over a non-pipelined architecture Ideal pipelined cpu has CPI=1 Pipelining has advantages of
significant speedup with moderate hardware costs invisible to programmer
Pipeline challenges include

Structural hazards Data hazards Control hazards Exceptions Floating point operations
AppendixA Summary
Solutions include
Stalls Forwarding Buffering state (for exceptions) Branch delay slots Branch prediction Several multi-cycle execution units for FP

Appendix A

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Appendix A

Uploaded by

Copyright:

Available Formats

Appendix A

Pipelining: Basic and Intermediate Concepts

Implementation Difficulties for Pipelines

Extending MIPS Pipeline to Multicycle operations Example: MIPS R4000 Pipeline

Performance and Pipelining

RISC (MIPS) Pipeline

Memory (instructions and data)

Simplified Control State Diagram

IDecode Rtype Rexec

Assume the following instruction frequencies:

cycles CPI = (.12*3)+(.1*5)+(.78*4) = 3.98 instruction

Use separate data and instruction memories

Simplified Pipelined Datapath

Pipeline register names

E. Time = 1ns * 4.4 = 4.4ns

One solution to some hazards

Performance with Stalls

Simple case: CPI unpipelined = # of pipeline stages

Structural Hazards (Resource Conflicts)

Data Hazards (Instruction Dependencies)

DADD R1, R2, R3

DSUB R4, R1, R5

DADD R1, R2, R3

DSUB R4, R1, R5

R10, R1, R11 read register file write register file

Hazard detection causes stall until hazard is cleared. Hardware interlock

Stall to Solve Data Hazard

Pipeline hazard solution using a fetch redo after branch:

Branch redo penalties

Reducing Branch Penalty

Pipeline freeze (or flush) Predicted-not-taken Predicted-taken Delayed branch

Performance with Branches

Pipeline depth Pipeline speedup = 1 + Branch frequency x Branch penalty

Look at non-pipelined implementation first

Basic principles can be extended to all instructions

Instruction Fetch cycle (IF)

Instruction decode/register fetch cycle (ID)

Execution/effective address cycle (EX)

5. Write-back cycle (WB)

rs = R2 Regs[R2]+55 rd = R1 Address for LD instruction Regs[R2] Mem(Regs[R2]+55)

Example: Datapath for LD R1, 55(R2)

Add Pipeline Registers

PC can also be considered a pipeline register

Pipeline registers take place of IR, A, B, Imm, ALUoutput, LMD

Stage by Stage Operation

Implementation Difficulties for Pipelines

Extending MIPS Pipeline to Multicycle operations Example: MIPS R4000 Pipeline

Hazard Detection Situations

Hazard can be detected in the ID stage by comparing rt and rs registers

Load Interlock Detection Logic

Matching Operand Fields

Register-register ID/EX.IR[rt]==IF/ID.IR[rs] ALU Register-register ID/EX.IR[rt]==IF/ID.IR[rt] ALU

Implementing a Stall after Detection

Additional MUX inputs and paths

Logic for Early Test and Target Address

Compare to Previous Version

Exceptions and Pipelines

Exceptions and Pipelines

User requested vs coerced

User maskable vs user non-maskable

Within vs between instructions

Classifications of Exception Types

Restartable pipelines or processors can handle these

cycles CPI = (.123)+(.15)+(.78*4) = 3.98 instruction