Professional Documents
Culture Documents
App
App
App
What is an ISA?
And what is a good ISA?
System software
CSE 371
Computer Organization and Design
Mem
CPU
I/O
Aspects of ISAs
With examples: LC3, MIPS, x86
Readings
What Is An ISA?
P+H
Chapter 2
Instruction ! Insn
Instruction is too long to write in slides
CIS 371 (Roth/Martin): Instruction Set Architectures
Communication
Person-to-person ! software-to-hardware
Fetch
Decode
Read Inputs
Similar structure
Narrative ! program
Sentence ! insn
Verb ! operation (add, multiply, load, branch)
Noun ! data item (immediate, register value, memory value)
Adjective ! addressing mode
Execute
Write Output
Next Insn
LC3
P37X
LC3 highlights
Similarities
16-bit data types
16-bit instructions, four-bit opcode
Similar TRAPs and devices
Differences
More ALU ops: Add, Sub, Mul, Or, Not, And, Xor, Shift Left/Right
No LDI, STI (indirect load/stores)
No condition codes
Compiler Optimizations
But also
Reduce branches and jumps (later)
Reduce cache misses (later)
Reduce dependences between nearby insns (later)
An ISA can make this difficult by having implicit dependences
10
12
Foreshadowing: Pipelining
13
Insn0
Fetch
Insn1
Decode
Read Inputs
Execute
Fetch
Decode
Read Inputs
Write Output
Next Insn
Execute
Write Output
ISA Basics
Aspects of ISAs
Fetch
Decode
Read Inputs
Execute
Insn3
Insn4
Insn5
Fetch
Decode
Read Inputs
Fetch
Decode
Fetch
RISC/CISC
Minimalist approach to an ISA: simple insns only
+ Low cycles/insn and seconds/cycle
Higher insn/program, but hopefully not as much
Rely on compiler optimizations
Insn2
time
Cycle/insn: 1 by definition
Seconds/cycle: proportional to complexity of datapath
ISA can make seconds/cycle high by requiring a complex datapath
14
VonNeumann model
Data types and operations
Operand model
Control
Encoding
Operating system support
Multiprocessing support
Examples
LC3 (P37X)
MIPS
x86
15
16
Datatypes
Fetch
Decode
Read Inputs
Execute
Write Output
Next Insn
LC3
MIPS
32(64) bit integer: add, sub, mul, div, shift, rotate, and, or, not, xor
32(64) bit floating-point: add, sub, mul, div
x86
17
Write Output
Next Insn
18
No
Read Inputs
Execute
Memory
Decode
32(64) bit integer: add, sub, mul, div, shift, rotate, and, or, not, xor
80-bit floating-point: add, sub, mul, div, sqrt
64-bit packed integer (MMX): padd, pmul
64(128)-bit packed floating-point (SSE/2): padd, pmul
BCD!!! (not really used obviously)
Registers
Immediates
19
20
LC3/MIPS/x86 Registers
LC3/P37X
MIPS
32 32-bit integer registers ($0 hardwired to 0)
32 32-bit floating-point registers (or 16 64-bit registers)
x86
8 8/16/32-bit integer registers (not general purpose)
No floating-point registers!
21
LC3/P37X
22
MIPS
32-bit
64-bit
x86
8086: 16-bit
80286: 24-bit
80386: 32-bit
AMD Opteron/Athlon64, Intels newer Pentium4, Core 2: 64-bit
23
24
LC3
MIPS
3: general-purpose
add R1,R2,R3 means [R1] = [R2] + [R3]
25
to/from registers/accumulator/stack?
Assume displacement addressing for these examples
Registers: load/store
Accumulator: load/store
Stack: push/pop
In most niches
26
27
28
Control Transfers
LC3
Fetch
Decode
Read Inputs
MIPS
Execute
Write Output
Next Insn
x86
Integer (8 registers) reg-reg, reg-mem, mem-reg, but no mem-mem
Floating point: stack (why x86 floating-point sucked for years)
Note: integer push, pop for managing software stack
Note: also reg-mem and mem-mem string functions in hardware
x86-64
Integer/floating-point: 16 registers
CIS 371 (Roth/Martin): Instruction Set Architectures
29
30
The issues
PC-relative
subtract R2,R1,10
// sets negative CC
branch-neg target
+ Condition codes set for free
Implicit dependence is tricky
Absolute
Position independent outside procedure
Used for procedure calls
branch-less-than R1,10,target
+ Simple
Two ALUs: one for condition, one for target address
Extra latency
31
set-less-than R2,R1,10
branch-not-equal-zero R2,target
Additional insns
+ one ALU per insn, explicit dependence
CIS 371 (Roth/Martin): Instruction Set Architectures
32
LC3
Length
Fetch
P37X
Decode
Read Inputs
Execute
Write Output
MIPS
16-bit offset PC-relative conditional branches (uses register for condition)
Compare 2 regs: beq, bne or reg to 0: bgtz, bgez, bltz, blez
+ Dont need adder for these, cover 80% of cases
Explicit set condition into registers: slt, sltu, slti, sltiu, etc.
26-bit target absolute jumps and function calls
x86
Encoding
Next Insn
Fixed length
Most common is 32 bits
+ Simple implementation: next PC = PC+4
+ Longer reach for branch/jump targets
Code density: 32 bits to increment a register by 1?
Variable length
Complex implementation
+ Code density
Compromise: two lengths
MIPS16 or ARMs Thumb
A few simple encodings simplify decoder
33
34
0-reg
0-reg Op(4)
Offset(12)
1-reg
1-reg Op(4) R(3)
Offset(9)
2-reg
2-reg Op(4) R(3)R(3)
R(3)R(3) Offset(6)
3-reg
U(3)R(3)
R(3)
3-reg Op(4) R(3)R(3)
R(3)R(3)U(3)
Op(6)
Rs(5)
Rs(5) Rt(5)
Rt(5) Rd(5) Sh(5)
Sh(5) Func(6)
Func(6)
I-type
Op(6)
Rs(5)
Rs(5) Rt(5)
Rt(5)
J-type
Op(6)
Immed(16)
Immed(16)
Target(26)
Op
OpExt*
OpExt*
ModRM*
ModRM*
SIB*
Disp*(1-4)
Disp*(1-4) Imm*(1-4)
Imm*(1-4)
35
36
LC3/MIPS/x86 OS Support
LC3
37
MIPS
Trap, return from trap
Exception coprocessor
Interrupts
X86
Trap, return from trap
Exception flags
Multi-level interrupts
CIS 371 (Roth/Martin): Instruction Set Architectures
Multiprocessing Support
Memory model
Atomic conditional reg/mem swap insns
Fence insns
LC3
No multiprocessing support
MIPS/x86
Yes, please
RISC won the (technology) battle, CISC won the (commercial) war
Compatibility a stronger force than anyone (but Intel) thought
Intel & AMD beat RISC at its own game
38
39
40
41
42
High-level ISA
Pipelined
Fetch/Decode
Fetch/Decode
Execute
Execute
Fetch/Decode
Execute
Low-level ISA
Single-cycle
Fetch/Decode
Execute
Fetch/Decode
Execute
Fetch/Decode
Execute
Fetch/Decode
Simplify implementation
Facilitate optimizing compilation
Some guiding principles (tenets)
Single cycle execution/hard-wired control
Fixed instruction length, format
Lots of registers, load-store architecture, few addressing modes
Execute
Fetch/Decode
Execute
43
44
The Debate
Compatibility
RISC argument
CISC rebuttal
CISC flaws not fundamental, can be fixed with more transistors
Moores Law will narrow the RISC/CISC gap (true)
Good pipeline: RISC = 100K transistors, CISC = 300K
By 1995: 2M+ transistors had evened playing field
Software costs dominate, compatibility is paramount
CIS 371 (Roth/Martin): Instruction Set Architectures
45
Backward compatibility
New processors must support old programs (cant drop features)
Very important
OoO was very hard to do with a coarse grain ISA like x86
46
47
48
49
Software emulation
50
51
52
Forward compatibility
Reserve set of trap opcodes (dont define uses)
Add ISA functionality by overloading traps
Release firmware patch to add to old implementation
CIS 371 (Roth/Martin): Instruction Set Architectures
53
Summary
App
App
App
System software
Mem
CPU
I/O
What is an ISA?
A functional contract
55
54