Professional Documents
Culture Documents
Arquitectura de Computadoras
Instruction Set
by the programmer, i. e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls of logic design, and the physical implementation.
Amdahl, Blaaw, Brooks, 1964.
Arquitectura de Computadoras
ISA- 2
software
instruction set
hardware
programmer, i. e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls of logic design, and the physical implementation.
Amdahl, Blaaw, Brooks, 1964. Organization of programmable storage Data types & data structures: encodings and representations Instruction formats Instruction (or Operand Code) Set Modes of addressing and accessing data items and instructions Exceptional conditions
Arquitectura de Computadoras ISA- 4
Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction
Arquitectura de Computadoras
Location of operands and result where other than memory? how many explicit operands? how are memory operands located? which can or cannot be in memory? Data type and Size Operations what are supported Successor instruction jumps, conditions, branches fetch-decode-execute is implicit!
ISA- 5
General Purpose Register Machines Complex Instruction Sets (Vax, Intel 432 1977-80) Load/Store Architecture (CDC 6600, Cray 1 1963-76) RISC: MIPS, Sparc, 88000, IBM RS6000, ... 1987
Arquitectura de Computadoras ISA- 6
acc acc + mem[A] acc acc + mem[A + x] tos tos + next EA(A) EA(A) + EA(B) EA(A) EA(B) + EA(C)
Stack
0 address add 2 address add A B 3 address add A B C
Load/Store
3 address add Ra Rb Rc Ra Rb + Rc load Ra Rb Ra mem[Rb] store Ra Rb mem[Rb] Ra
Arquitectura de Computadoras
ISA- 7
Stack
Push A Push B Add Pop C
Accumulator
Load A Add B Store C
Register (register-memory)
Load R1,A Add R1,B Store C, R1
Register (load-store)
Load R1,A Load R2,B Add R3,R1,R2 Store C,R3
ISA- 8
Arquitectura de Computadoras
Register disadvantages
Must save/restore on procedure calls Cant take the address of a register Fixed size (FP, strings, structures) Compiler must control (?)
Arquitectura de Computadoras
ISA- 10
+ Hold operands longer (reducing memory traffic & potentially execution time) - Longer register specifiers (except with register windows) - Slow registers - More state slows context switches
Arquitectura de Computadoras
ISA- 11
MIPS I Registers
Programmable storage
232 x bytes of memory 31 x 32-bit GPRs (R0 = 0) 32 x 32-bit FP regs (paired DP) HI, LO, PC
r0 r1 r31 PC lo hi
Arquitectura de Computadoras
ISA- 12
Typical Operations
Data Movement Load (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack)
integer (binary + decimal) or FP Add, Subtract, Multiply, Divide shift left/right, rotate left/right not, and, or, set, clear
Arquitectura de Computadoras
ISA- 13
Typical Operations
unconditional, conditional call, return trap, return test & set (atomic r-m-w) search, translate parallel subword ops (4 16bit add)
Integer Average Percent total executed 22% 20% 16% 12% 8% 6% 5% 1% 1% 96%
ISA- 15
move register-register 4%
Operation Summary
Support these simple instructions, since they
load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return;
Arquitectura de Computadoras
ISA- 16
Arquitectura de Computadoras
ISA- 17
Memory Addressing
address of most significant byte = word address (xx00 = Big End of word)
IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
0 lsb
3
ISA- 19
0 Aligned
Not Aligned
Arquitectura de Computadoras
ISA- 20
Alignment
No
restrictions
Simpler software Hardware must detect misalignment and make 2 memory accesses expensive logic, slows down all references sometimes required for backward compatibility
Restrictred alignment software must guarantee alignment hardware only detecs misalignment and traps trap handler does it Middle group misaligned data ok but requires multiple instructions compiler must skill know still trap on misaligned access
Arquitectura de Computadoras
ISA- 21
A typical RISC
32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair) 3-address, reg-reg arithmetic instruction
see: SPARC, MIPS MC88100, AMD2900, i960, i860, PARisc, DEC Alpha, Clipper, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3, ...
Arquitectura de Computadoras ISA- 22
VAX-11
Byte 0 OpCode 1 A/M n A/M m A/M
Variable format, 2 and 3 address instruction 32-bit word size, 16 GPR (four reserved) Rich set of addressing modes (apply to any operand) Rich set of operations
bit-field, stack, call, case, loop, string, poly, system)
Arquitectura de Computadoras
ISA- 23
Register Base + Displacement Immediate Register Indirect Direct (absolute) Base + Index Scaled Index Autoincrement Autodecrement Memory Indirec [Indirection chains]
Ri M[Ri + v] v M[Ri] M[v] M[Ri + Rj] M[Ri + Rj*d + v] M[Ri++] M[Ri--] M[ M[Ri] ]
(VAX) --- Displacement: 42% avg, 32% to 55% 75% 85% --- Immediate: 33% avg, 17% to 43% --- Register deferred (indirect): 13% avg, 3% to 24% --- Scaled: 7% avg, 0% to 16% --- Memory indirect: 3% avg, 1% to 6% --- Misc: 2% avg, 0% to 3% 75% displacement & immediate 85% displacement, immediate & register indirect
Arquitectura de Computadoras ISA- 25
FP Avg.
Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs 1% of addresses > 16-bits 12 - 16 bits of displacement needed
Arquitectura de Computadoras ISA- 26
Immediate Size?
50% to 60% fit within 8 bits 75% to 80% fit within 16 bits
Arquitectura de Computadoras
ISA- 27
Addressing Summary
Arquitectura de Computadoras
ISA- 28
Variable:
Fixed:
Hybrid:
Arquitectura de Computadoras
ISA- 29
Instruction Formats
If code size is most important,
optional mode to execute subset of 16-bit wide instructions (Thumb, MIPS16); per procedure decide performance or density
Some architectures actually exploring on-the-fly
Instruction Format
If have many memory operands per instruction and/or many addressing modes: =>Need one address specifier per operand If have load-store machine with 1 address per instr. and one or two addressing modes: => Can encode addressing mode in the opcode
Arquitectura de Computadoras
ISA- 31
Memory
register PC-relative op rs PC rt
Memory
Register Indirect?
Arquitectura de Computadoras
1975: Gordon Moore realized one more chance for new ISA before
hybrid stack/register scheme 1982: 80286 24-bit address, protection, memory mapping 1985: 80386 32-bit address, 32-bit GP registers, paging
Arquitectura de Computadoras ISA- 33
Most instructions have two operands, one possibly from memory One super-duper addressing mode w/ effective address = base_reg + (index_reg * scaling_factor) + displacement Many formats: see H&P fig. D.8
Arquitectura de Computadoras
ISA- 34
Intel MMX
Micro, 8/96]
a15*b15)
Typical Operations
unconditional, conditional call, return trap, return test & set (atomic r-m-w) search, translate parallel subword ops (4 16bit add)
Control Instructions
Taken or not taken? Conditional branches Jumps Procedure calls Procedure returns O.S. calls O.O. returns X
X X
X X
Arquitectura de Computadoras
ISA- 37
+ No special state to save and implement but uses up a register - branch condition separated from branch logic in pipeline
ISA- 40
Common compromise
(Conditional) branches (pc-rel) (Unconditional) jumps (pc-rel, reg) Procedure calls (pc-rel, reg) Procedure returns (reg) O.S. calls (trap) O.S. returns (reg)
(Vectored) Trap
Critical for O.S. calls + Protection. - Implementation headache
Arquitectura de Computadoras
ISA- 41
Processor stack
+ Recursion supported directly Complex instruction
ISA- 42
32 single-precision FP
Memory Access:
LB, LBU, LH, LHU, LW, LWL,LWR SB, SH, SW, SWL, SWR
Arquitectura de Computadoras
ISA- 45
Multiply / Divide
Start multiply, divide
MULT rs, rt MULTU rs, rt DIV rs, rt DIVU rs, rt
Registers
Move to HI or LO
MTHI rd MTLO rd
HI
LO
Data Types
Bit: 0, 1 Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word Character: ASCII 7 bit code UNICODE 16 bit code Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers: 2's Complement Floating Point: Single Precision Double Precision Extended Precision
Arquitectura de Computadoras
How many +/- #'s? Where is decimal pt? How are +/- exponents represented?
ISA- 47
Support for these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers
Arquitectura de Computadoras ISA- 48
Comments 3 operands; exception possible 3 operands; exception possible + constant; exception possible 3 operands; no exceptions 3 operands; no exceptions + constant; no exceptions 64-bit signed product 64-bit unsigned product Lo = quotient, Hi = remainder Unsigned quotient & remainder Used to get copy of Hi Used to get copy of Lo
Arquitectura de Computadoras
ISA- 49
Comment 3 reg. operands; Logical AND 3 reg. operands; Logical OR 3 reg. operands; Logical XOR 3 reg. operands; Logical NOR Logical AND reg, constant Logical OR reg, constant Logical XOR reg, constant Shift left by constant Shift right by constant Shift right (sign extend) Shift left by variable Shift right by variable Shift right arith. by variable
Arquitectura de Computadoras
ISA- 50
Arquitectura de Computadoras
ISA- 51
Arithmetic instructions (add, sub, etc.) sign extend immediates even for the unsigned versions of the instructions! Logical instructions do not sign extend
versions do not.
Arquitectura de Computadoras ISA- 52
Condition Codes Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions.
Arquitectura de Computadoras
ISA- 53
Arquitectura de Computadoras
ISA- 54
PC-relative since most branches are relatively close to the current PC Compare Equal/Not Equal most important for integer programs (86%)
ISA- 55
if R[rs] == R[rt] then PC-relative branch <> if R[rs] <= 0 then PC-relative branch > < >= if R[rs] < 0 then branch and link (into R 31) >=!
Example Meaning beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch bne $1,$2,100 if ($1!= $2) go to PC+4+100 Not equal test; PC relative slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; 2s comp. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; 2s comp. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; unsigned numbers sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; unsigned numbers j 10000 go to 10000 Jump to target address jr $31 go to $31 For switch, procedure return jal 10000 $31 = PC + 4; go to 10000 For procedure call
ISA- 57
0000 0000 0000 0001two R2= 000 0000 0000 0000 0010 two R3= 111 1111 1111 1111 1111 two After executing these instructions: slt r4,r2,r1 ; if (r2 < r1) r4=1; else r4=0 slt r5,r3,r1 ; if (r3 < r1) r5=1; else r5=0 sltu r6,r2,r1 ; if (r2 < r1) r6=1; else r6=0 sltu r7,r3,r1 ; if (r3 < r1) r7=1; else r7=0
What are values of registers r4 - r7? Why?
r4 =
; r5 =
; r6 =
; r7 =
;
ISA- 58
Arquitectura de Computadoras
A:
Some machines provide a memory stack as part of the architecture (e.g., VAX) Sometimes stacks are implemented via software convention (e.g., MIPS)
Arquitectura de Computadoras ISA- 59
Memory Stacks
Useful for stacked environments/subroutine call & return even if operand stack not part of architecture Stacks that Grow Up vs. Stacks that Grow Down: Next Empty? c b a inf. Big grows up 0 Little
grows down
Memory Addresses
SP
Last Full?
0 Little How is empty stack represented? Little --> Big/Last Full POP: PUSH: Read from Mem(SP) Decrement SP
inf. Big
Little --> Big/Next Empty POP: PUSH: Decrement SP Read from Mem(SP) Write to Mem(SP) Increment SP
ISA- 60
High Mem
Local Variables FP Grows and shrinks during expression evaluation SP Low Mem
Many variations on stacks possible (up/down, last pushed / next ) Compilers normally keep scalar variables in registers, not memory!
Arquitectura de Computadoras ISA- 61
16 s0 callee saves . . . (callee must save) 23 s7 24 t8 25 t9 26 k0 reserved for OS kernel 27 k1 28 gp Pointer to global area temporary (contd)
v0 expression evaluation & v1 function results a0 arguments a1 a2 a3 t0 temporary: caller saves (callee can clobber)
$sp, $sp, -32 $ra, 20($sp) $fp, 16($sp) $fp, $sp, 32 $a0, 0($fp) $31, 20($sp) $fp, 16($sp) $sp, $sp, 32 $31
SP ra FP SP ra
low address
ra old FP
FP SP
ra old FP
ISA- 63
Register zero always has the value zero (even if you try to write it) Branch/jump and link put the return addr. PC+4 or 8 into the link
register (R31) (depends on logical vs physical architecture) All instructions change all 32 bits of the destination register (including lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, ) Immediate arithmetic and logical instructions are extended as follows:
logical immediates ops are zero extended to 32 bits arithmetic immediates ops are sign extended to 32 bits (including addu) The data loaded by the instructions lb and lh are extended as follows: lbu, lhu are zero extended lb, lh are sign extended Overflow can occur in these arithmetic and logical instructions: add, sub, addi it cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu Arquitectura de Computadoras ISA- 64
Delayed Branches
li sub bz r3, #7 r4, r4, 1 r4, LL
addi r5, r3, 1 subi r6, r6, 2 LL: slt r1, r3, r5
By the end of Branch instruction, the CPU knows whether or not the branch will take place. However, it will have fetched the next instruction by then, regardless of whether or not a branch will be taken. Why not execute it?
Arquitectura de Computadoras ISA- 66
Compiler can fill a single delay slot with a useful instruction 50% of the time. try to move down from above jump move up from target, if safe
add r3, r1, r2 sub r4, r4, 1 bz r4, LL NOP ... LL: add rd, ...
control syscall
coprocessor instrs. TLB instructions restore from exception
A breakpoint trap occurs, transfers to exception handler A system trap occurs, transfers control to exception handler Support for floating point Support for virtual memory: discussed later Restores previous interrupt mask & mode bits into status register Supports misaligned word loads Supports misaligned word stores
Arquitectura de Computadoras
ISA- 68
Arquitectura de Computadoras
# $8 = $17 + $18
31
26 25
21 20
16 15
11 10
6 5
OpCode
rs
rt
rd
shamt
funct
31
26 25
21 20
16 15
11 10
6 5
17
18
32
Arquitectura de Computadoras
ISA- 70
OpCode
31 26 25
rs
21 20
rt
16 15
immediate
0
op
17
-44
Arquitectura de Computadoras
ISA- 71
31
26 25
OpCode
target
31
26 25
-44
Arquitectura de Computadoras
ISA- 72
Compares the VAX 8700 vs MIPS M/2000 (R3000 chip) Combines three fractors:
Architecture Implementation Compilers and OS
Argues that:
Implementation effects are second order Compilers are similar RISCs are better than CISCs
Bechmark spice2g6 matrix300 nasa7 fpppp tomcatv dudoc espresso eqntott li geo. mean
Arquitectura de Computadoras
Inst. Ratio 2.5 2.4 2.1 2.9 2.9 2.7 1.7 1.1 1.6 2.2
CPI CPI Ratio RISC MIPS VAX factor 1.8 8.0 4.4 1.8 3.0 13.8 4.5 1.9 3.0 15.0 5.0 2.4 1.5 15.2 10.5 2.7 2.1 17.5 8.2 2.9 1.7 13.2 7.9 3.0 1.1 5.4 5.1 3.0 1.3 4.4 3.5 3.3 1.1 6.5 6.0 3.7 1.7 9.9 5.8 2.7
ISA- 74
Arquitectura de Computadoras
ISA- 75
Arquitectura de Computadoras
ISA- 76
Homework Assignment 3
Make a review of the ISA for a specific processor Write a three page report explaining the following: 1. Kind of IS Architecture (RISC, CISC or other, 16-bit, 322. 3. 4. 5. 6.
bit, 64-bit) Classes of instructions (ALU, Memory Movement, Branches, etc.) Addressing Modes (immediate, base+displacemente, indirect, etc.) Displacements in branches and control flow instructions (call, ret) Special instructions: system calls, traps, access to special purpose registers Instructions formats
Arquitectura de Computadoras
ISA- 77
Arquitectura de Computadoras
ISA- 78
Arquitectura de Computadoras
ISA- 79
VAX-11
Introduced by DEC in 1977: VAX
11/780 Upward compatible from PDP-11 32-bit word and addresses Virtual memory is first-class 16 GPRs (r15 is PC, r14 is SP), CCs Extremely orthogonal, memorymemory Decode as byte stream
Opcode: operation, number of operands & operand type Variable-length address specifiers
Arquitectura de Computadoras
ISA- 80
VAX-11, cont.
Data types
8-, 16-, 32-, 64-, 128-bit integers F (32 bits), D (64), G (64), H (128) FP Character string (8-bits/char) Decimal (4-bits/digit) Numeric string (8-bits/digit)
ISA- 81
VAX-11, cont.
Operations
Data Transfers (including string move) Arithmetic and Logical (2 and 3 operands) Control (Branch, Jump, etc. )
Procedure (CALLs save state) Bit Manipulation Floating Poing (Add/Sub/Mult/Divide) POLYF -- Polynomial Evaluation System (Exception, VM) Other
CRC -- Cycle redundant chech INSQUE -- Insert entry in queue
Arquitectura de Computadoras
ISA- 82
VAX-11, cont.
8 bits
New instruction: opcode calls for three operands Specifier 1: four bits + register
PC+0 PC+1
VAX has too many modes & formats Serial semantics limit parallel execution
The big deal with RISC is not REDUCED number of instructions; its few modes & formats to facilitate pipelining
ISA- 83
PC+6
PC+9
PC+15
next instruction
Arquitectura de Computadoras
DEC Alpha
Introduced by DEC in 1992
Ability to emulate VAX instructions important Strongly influenced by Cray-1
Registers
32 64-bit GPRs (r31 = 0) 32 64-bit FPRs
Arquitectura de Computadoras
ISA- 85
base
displacement
src
displacement
OpCode
PAL argument
Arquitectura de Computadoras
ISA- 86
OpCode
src1
src2
000 0
function
dest
OpCode
src1
const
function
dest
OpCode
src1
src2
function
dest
Arquitectura de Computadoras
ISA- 87
Load/Store Instructions
Load/Store Quadwords (64-bits) Load-Linked/Store Conditional (for MP synchronization)
Arquitectura de Computadoras ISA- 88
Branch Hints
Different hint/rule type of branch
Supervision Instructions
PAL code for needed task
Arquitectura de Computadoras ISA- 89