You are on page 1of 89

Laboratorio de Tecnologas de Informacin

Instruction Set Architecture


Arquitectura de Computadoras
Arturo Daz Prez Centro de Investigacin y de Estudios Avanzados del IPN Laboratorio de Tecnologas de Informacin adiaz@cinvestav.mx
ISA- 1

Arquitectura de Computadoras

Instruction Set

Laboratorio de Tecnologas de Informacin

... the attributes of a [computing] system as seen

by the programmer, i. e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls of logic design, and the physical implementation.
Amdahl, Blaaw, Brooks, 1964.

Arquitectura de Computadoras

ISA- 2

Instruction Set Design

Laboratorio de Tecnologas de Informacin

software
instruction set

hardware

Which is easier to change/design?


Arquitectura de Computadoras ISA- 3

Instruction Set Architecture


... the attributes of a [computing] system as seen by the

Laboratorio de Tecnologas de Informacin

programmer, i. e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls of logic design, and the physical implementation.
Amdahl, Blaaw, Brooks, 1964. Organization of programmable storage Data types & data structures: encodings and representations Instruction formats Instruction (or Operand Code) Set Modes of addressing and accessing data items and instructions Exceptional conditions
Arquitectura de Computadoras ISA- 4

ISA: What Must be Specified?


Instruction Format or Encoding how is it decoded?

Laboratorio de Tecnologas de Informacin

Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction
Arquitectura de Computadoras

Location of operands and result where other than memory? how many explicit operands? how are memory operands located? which can or cannot be in memory? Data type and Size Operations what are supported Successor instruction jumps, conditions, branches fetch-decode-execute is implicit!
ISA- 5

Evolution of Instruction Sets


Single Accumulator (EDSAC 1950) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953 Separation of Programming Model from Implementation High-level Language Based (B5000 1963) Concept of a Family IBM 360 1964

Laboratorio de Tecnologas de Informacin

General Purpose Register Machines Complex Instruction Sets (Vax, Intel 432 1977-80) Load/Store Architecture (CDC 6600, Cray 1 1963-76) RISC: MIPS, Sparc, 88000, IBM RS6000, ... 1987
Arquitectura de Computadoras ISA- 6

Basic ISA Classes


Accumulator:
1 address add A 1+x address addx A

Laboratorio de Tecnologas de Informacin

acc acc + mem[A] acc acc + mem[A + x] tos tos + next EA(A) EA(A) + EA(B) EA(A) EA(B) + EA(C)

Stack
0 address add 2 address add A B 3 address add A B C

General Purpose register

Load/Store
3 address add Ra Rb Rc Ra Rb + Rc load Ra Rb Ra mem[Rb] store Ra Rb mem[Rb] Ra
Arquitectura de Computadoras

Most real machines are hybrids of those

ISA- 7

Comparing Number of Instructions

Laboratorio de Tecnologas de Informacin

Comparison: Bytes per instruction? Number of Instructions? Cycles per instruction?

Code sequence for (C = A + B) for four classes of instruction sets:

Stack
Push A Push B Add Pop C

Accumulator
Load A Add B Store C

Register (register-memory)
Load R1,A Add R1,B Store C, R1

Register (load-store)
Load R1,A Load R2,B Add R3,R1,R2 Store C,R3
ISA- 8

Arquitectura de Computadoras

General Purpose Registers Dominate


1975-200x all machines use general purpose

Laboratorio de Tecnologas de Informacin

registers Advantages of registers


registers are faster than memory registers are easier for a compiler to use
e.g., (A*B) (C*D) (E*F) can do multiplies in any order vs. stack

registers can hold variables


memory traffic is reduced, so program is sped up (since registers are faster than memory) code density improves (since register named with fewer bits than memory location)
Arquitectura de Computadoras ISA- 9

Caches vs. Registers


Registers advantages
Faster (no addressing mode, no tags) Deterministic (no misses) Can duplicate for two ports Short identifier (3-8 bits)

Laboratorio de Tecnologas de Informacin

Register disadvantages
Must save/restore on procedure calls Cant take the address of a register Fixed size (FP, strings, structures) Compiler must control (?)

Arquitectura de Computadoras

ISA- 10

Caches vs. Registers (contd)


How many registers? More means

Laboratorio de Tecnologas de Informacin

+ Hold operands longer (reducing memory traffic & potentially execution time) - Longer register specifiers (except with register windows) - Slow registers - More state slows context switches

Arquitectura de Computadoras

ISA- 11

MIPS I Registers

Laboratorio de Tecnologas de Informacin

Programmable storage
232 x bytes of memory 31 x 32-bit GPRs (R0 = 0) 32 x 32-bit FP regs (paired DP) HI, LO, PC

r0 r1 r31 PC lo hi

Arquitectura de Computadoras

ISA- 12

Typical Operations
Data Movement Load (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack)

Laboratorio de Tecnologas de Informacin

Arithmetic Shift Logical

integer (binary + decimal) or FP Add, Subtract, Multiply, Divide shift left/right, rotate left/right not, and, or, set, clear

Arquitectura de Computadoras

ISA- 13

Typical Operations

Laboratorio de Tecnologas de Informacin

Control (Jump/Branch) Subroutine Linkage Interrupt Synchronization String Graphics (MMX)

unconditional, conditional call, return trap, return test & set (atomic r-m-w) search, translate parallel subword ops (4 16bit add)

little change since 1960


Arquitectura de Computadoras ISA- 14

Top 10 80x86 Instructions


Rank instruction 1 2 3 4 5 6 7 8 9 10 load conditional branch compare store add and sub call return Total
Arquitectura de Computadoras

Laboratorio de Tecnologas de Informacin

Integer Average Percent total executed 22% 20% 16% 12% 8% 6% 5% 1% 1% 96%
ISA- 15

move register-register 4%

Simple instructions dominate instruction frequency

Operation Summary
Support these simple instructions, since they
load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return;
Arquitectura de Computadoras

Laboratorio de Tecnologas de Informacin

will dominate the number of instructions executed:

ISA- 16

Operands for ALU instructions


ALU instructions combine operands (e.g. ADD) Number of explicit operands
Two - destination equals one source Three - orthogonal

Laboratorio de Tecnologas de Informacin

Operands in registers or memory


Any combination -- VAX
(orthogonal, but variable instr. formats)

At least one register -- much of 360


(not orthogonal)

All registers -- CRAY, DLX, RISCs


(orthogonal, but needs loads/stores)

Arquitectura de Computadoras

ISA- 17

Memory Addressing

Laboratorio de Tecnologas de Informacin

Since 1980 almost every machine uses

addresses to level of 8-bits (byte) 2 questions for design of ISA:


Since could read a 32-bit word as four loads of bytes from sequential byte addresses or as one load word from a single byte address,
How do byte addresses map onto words? Can a word be placed on any byte boundary?
Arquitectura de Computadoras ISA- 18

Addressing Objects: Endian Wars


Big Endian:

Laboratorio de Tecnologas de Informacin

address of most significant byte = word address (xx00 = Big End of word)
IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA

Little Endian: address of least significant byte = word address

(xx00 = Little End of word)


Intel 80x86, DEC Vax, DEC Alpha (Windows NT) Mode selectable becoming more common: PowerPC, MIPS R10000 little endian byte 0 3 msb 0 big endian byte 0
Arquitectura de Computadoras

0 lsb

3
ISA- 19

Addressing Objects: Alignment


multiple of their size.

Laboratorio de Tecnologas de Informacin

Alignment: require that objects fall on address that is

0 Aligned

Not Aligned

Arquitectura de Computadoras

ISA- 20

Alignment
No

Laboratorio de Tecnologas de Informacin

restrictions
Simpler software Hardware must detect misalignment and make 2 memory accesses expensive logic, slows down all references sometimes required for backward compatibility

Restrictred alignment software must guarantee alignment hardware only detecs misalignment and traps trap handler does it Middle group misaligned data ok but requires multiple instructions compiler must skill know still trap on misaligned access

Arquitectura de Computadoras

ISA- 21

A typical RISC
32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair) 3-address, reg-reg arithmetic instruction

Laboratorio de Tecnologas de Informacin

Single address mode for load/store: base+displacement


no indirection

Simple branch conditions Delay branch

see: SPARC, MIPS MC88100, AMD2900, i960, i860, PARisc, DEC Alpha, Clipper, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3, ...
Arquitectura de Computadoras ISA- 22

VAX-11
Byte 0 OpCode 1 A/M n A/M m A/M

Laboratorio de Tecnologas de Informacin

Variable format, 2 and 3 address instruction 32-bit word size, 16 GPR (four reserved) Rich set of addressing modes (apply to any operand) Rich set of operations
bit-field, stack, call, case, loop, string, poly, system)

Rich set of data types (B, W, L, Q, O, F, D, G, H) Condition codes

Arquitectura de Computadoras

ISA- 23

VAX-11: Addressing Modes


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Laboratorio de Tecnologas de Informacin

Register Base + Displacement Immediate Register Indirect Direct (absolute) Base + Index Scaled Index Autoincrement Autodecrement Memory Indirec [Indirection chains]

Ri M[Ri + v] v M[Ri] M[v] M[Ri + Rj] M[Ri + Rj*d + v] M[Ri++] M[Ri--] M[ M[Ri] ]

Memory Register File

Modes 1-4 account for 93 % of all operands on the VAX


Arquitectura de Computadoras ISA- 24

Addressing Mode Usage

Laboratorio de Tecnologas de Informacin

3 programs measured on machine with all address modes

(VAX) --- Displacement: 42% avg, 32% to 55% 75% 85% --- Immediate: 33% avg, 17% to 43% --- Register deferred (indirect): 13% avg, 3% to 24% --- Scaled: 7% avg, 0% to 16% --- Memory indirect: 3% avg, 1% to 6% --- Misc: 2% avg, 0% to 3% 75% displacement & immediate 85% displacement, immediate & register indirect
Arquitectura de Computadoras ISA- 25

Displacement Address Size?


Int. Avg. 30% 25% 20% 15% 10% 5% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Address Bits

Laboratorio de Tecnologas de Informacin

FP Avg.

Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs 1% of addresses > 16-bits 12 - 16 bits of displacement needed
Arquitectura de Computadoras ISA- 26

Immediate Size?

Laboratorio de Tecnologas de Informacin

50% to 60% fit within 8 bits 75% to 80% fit within 16 bits

Arquitectura de Computadoras

ISA- 27

Addressing Summary

Laboratorio de Tecnologas de Informacin

Data Addressing modes that are important:


Displacement, Immediate, Register Indirect

Displacement size should be 12 to 16 bits Immediate size should be 8 to 16 bits

Arquitectura de Computadoras

ISA- 28

Generic Example of Instruction Format Widths

Laboratorio de Tecnologas de Informacin

Variable:

Fixed:

Hybrid:

Arquitectura de Computadoras

ISA- 29

Instruction Formats
If code size is most important,

Laboratorio de Tecnologas de Informacin

use variable length instructions


If performance is most important,

use fixed length instructions


Recent embedded machines (ARM, MIPS) added

optional mode to execute subset of 16-bit wide instructions (Thumb, MIPS16); per procedure decide performance or density
Some architectures actually exploring on-the-fly

decompression for more density.


Arquitectura de Computadoras ISA- 30

Instruction Format

Laboratorio de Tecnologas de Informacin

If have many memory operands per instruction and/or many addressing modes: =>Need one address specifier per operand If have load-store machine with 1 address per instr. and one or two addressing modes: => Can encode addressing mode in the opcode

Arquitectura de Computadoras

ISA- 31

MIPS Addressing Modes/Instruction Formats


All instructions 32 bits wide
Register (direct) op rs rt rd

Laboratorio de Tecnologas de Informacin

register Immediate Base+index op op rs rs rt rt immed immed + immed +


ISA- 32

Memory

register PC-relative op rs PC rt

Memory

Register Indirect?
Arquitectura de Computadoras

Most Popular ISA of all time: Intel 80x86


1971: Intel invents microprocessor 4004/8008, 8080 in 1975

Laboratorio de Tecnologas de Informacin

1975: Gordon Moore realized one more chance for new ISA before

ISA locked in for decades


hired CS people in Oregon werent ready in 1977 (CS people did 432 in 1980) started crash effort for 16-bit microcomputer 1978: 8086 dedicated registers, segmented address, 16 bit 8088; 8-bit external bus version of 8086 1980: IBM selects 8088 as basis for IBM PC 1980: 8087 floating point coprocessor: adds 60 instructions using

hybrid stack/register scheme 1982: 80286 24-bit address, protection, memory mapping 1985: 80386 32-bit address, 32-bit GP registers, paging
Arquitectura de Computadoras ISA- 33

Intel x86 (IA-32)


1989: 80486 & Pentium in 1992: faster + MP few instructions 1997: MMX multimedia extensions 200X: Superseded by IA-64 (Merced, McKinley, Itanium, etc.) Difficult to explain and impossible to love See H&P Appendix D.8 Eight 32-bit registers (EAX, EBX, ..., but also ESP, EBP) Also 16- and 8-bit version (AX, AH, AL)

Laboratorio de Tecnologas de Informacin

Most instructions have two operands, one possibly from memory One super-duper addressing mode w/ effective address = base_reg + (index_reg * scaling_factor) + displacement Many formats: see H&P fig. D.8

Arquitectura de Computadoras

ISA- 34

Intel MMX
Micro, 8/96]

Laboratorio de Tecnologas de Informacin

MultiMedia eXtension to IA-32 [Peleg & Weiser, IEEE


Multimedia data values often need much less than 32 bits But are organized in groups (e.g. red/green/blue) So in 64-bit FP registers: 2x32, 4x16, 8x8

E.g. ADDB (for byte)


17 +17 ----- 34 87 13 --100 100 200 ----255 ... ... ... ... 6 more 6 more ----------6 more

MMX takes 16-element dot product (a0*b0 + a1*b1 + ... +

a15*b15)

from 200 to 16 instructions & from 76 to 12 cycles (6x)


Arquitectura de Computadoras ISA- 35

Typical Operations

Laboratorio de Tecnologas de Informacin

Control (Jump/Branch) Subroutine Linkage Interrupt Synchronization String Graphics (MMX)

unconditional, conditional call, return trap, return test & set (atomic r-m-w) search, translate parallel subword ops (4 16bit add)

little change since 1960


Arquitectura de Computadoras ISA- 36

Control Instructions

Laboratorio de Tecnologas de Informacin

Taken or not taken? Conditional branches Jumps Procedure calls Procedure returns O.S. calls O.O. returns X

Where is the target? X X X X X X

Link return address

Save or restore state

X X

X X

Arquitectura de Computadoras

ISA- 37

(1) Taken or not taken ?


Compare and branch instruction
+ + No extra compare instruction No state passed between instructions Requires ALU operation Restricts code scheduling opportunities

Laboratorio de Tecnologas de Informacin

Implicitly set condition codes (Z, N, V, C)


+ Can be set for free - Constrains code reordering - Extra state to save and restore

Explicitly set condition codes (Z, N, V, C)


+ Can be set for free + Decouples branch/fetch from pipeline - Extra state to save and restore
Arquitectura de Computadoras ISA- 38

(1) Taken or not taken ?, cont.


Condition in general-purpose register

Laboratorio de Tecnologas de Informacin

+ No special state to save and implement but uses up a register - branch condition separated from branch logic in pipeline

Some data for MIPS


> 80 % of compares for branches use immediates > 80 % of these immediates are zero 50 % compares for branches are =0 or != 0

Compromise used in MIPS


Have branch-if = 0 and branch-if != 0 Have compare instructions (r1=r2, r1 != r2, r1 < r2, r1 <= r2, etc.)

With pipelining, can we predict whether taken ?


Statically ? Dynamically ?
Arquitectura de Computadoras ISA- 39

(2) Where is the target ?


Could use Arbitrary Specifier ?
+ Orthogonal and powerful - More bits to specify, more time to decode - branch execution and target separated in pipeline

Laboratorio de Tecnologas de Informacin

PC-relative with immediate


+ Position independence (helps linking), target computable in branch unit + Short immediate sufficient. MIPS word immediate: <= 4 bits: 47 % <= 8 bits: 94 % <= 12 bits: 100 % - Target must be known statically (to link) - Cant jump arbitrarily far - Other techniques are required for returns and distance jumps Arquitectura de Computadoras

ISA- 40

(2) Where is the target ?, cont.


Register
+ Short specification + Can jump anywhere + Dynamic target okay (returns) - Extra instruction to load register

Laboratorio de Tecnologas de Informacin

Common compromise
(Conditional) branches (pc-rel) (Unconditional) jumps (pc-rel, reg) Procedure calls (pc-rel, reg) Procedure returns (reg) O.S. calls (trap) O.S. returns (reg)

(Vectored) Trap
Critical for O.S. calls + Protection. - Implementation headache

Arquitectura de Computadoras

ISA- 41

(3) Link return address ?


Required for procedure calls and O.S. calls
Implicit register + Fast, simple - SW must save register before next call - Surprise traps or interrups ? Explicit register - No important advantages over above - Register must be specified Many recent architectures use implicit register
Arquitectura de Computadoras

Laboratorio de Tecnologas de Informacin

Processor stack
+ Recursion supported directly Complex instruction

ISA- 42

(4) Save or restore state ?


What state ?
Procedure calls: registers O.S. calls: registers and PSW (incl. CCs)

Laboratorio de Tecnologas de Informacin

Hardware need not save registers


Caller can save registers in use Callee can save registers it will use

Hardware register save


Which (IBM STM, VAX CALLS) ? Is the above faster ? Register windows
Many recent architectures do no register saving or do implicit saving with register windows
Arquitectura de Computadoras ISA- 43

MIPS: Register State


32 integer registers
$0 is hardwared to 0 $31 is the return address register software convention for other registers

Laboratorio de Tecnologas de Informacin

32 single-precision FP

registers or 16 doubleprecision FP registers


PC and other special registers
Arquitectura de Computadoras ISA- 44

MIPS I Operation Overview


Arithmetic Logical:
Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV

Laboratorio de Tecnologas de Informacin

Memory Access:
LB, LBU, LH, LHU, LW, LWL,LWR SB, SH, SW, SWL, SWR

Arquitectura de Computadoras

ISA- 45

Multiply / Divide
Start multiply, divide
MULT rs, rt MULTU rs, rt DIV rs, rt DIVU rs, rt

Laboratorio de Tecnologas de Informacin

Registers

Move result from multiply, divide


MFHI rd MFLO rd

Move to HI or LO
MTHI rd MTLO rd

HI

LO

Why not third field for destination?


(Hint: how many clock cycles for multiply or divide vs. add?)
Arquitectura de Computadoras ISA- 46

Data Types
Bit: 0, 1 Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word Character: ASCII 7 bit code UNICODE 16 bit code Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers: 2's Complement Floating Point: Single Precision Double Precision Extended Precision
Arquitectura de Computadoras

Laboratorio de Tecnologas de Informacin

exponent MxR mantissa E base

How many +/- #'s? Where is decimal pt? How are +/- exponents represented?
ISA- 47

Operand Size Usage


Doubleword Word Halfword Byte 19% 0% 7% 0% 0% 20% 40% 60% 80% 0% 69% 74% 31%

Laboratorio de Tecnologas de Informacin

Int Avg. FP Avg.

Frequency of reference by size

Support for these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers
Arquitectura de Computadoras ISA- 48

MIPS arithmetic instructions


Instruction add subtract add immediate add unsigned subtract unsigned add imm. unsign. multiply multiply unsigned divide divide unsigned Move from Hi Move from Lo Example add $1,$2,$3 sub $1,$2,$3 addi $1,$2,100 addu $1,$2,$3 subu $1,$2,$3 addiu $1,$2,100 mult $2,$3 multu$2,$3 div $2,$3 divu $2,$3 mfhi $1 mflo $1 Meaning $1 = $2 + $3 $1 = $2 $3 $1 = $2 + 100 $1 = $2 + $3 $1 = $2 $3 $1 = $2 + 100 Hi, Lo = $2 x $3 Hi, Lo = $2 x $3 Lo = $2 $3, Hi = $2 mod $3 Lo = $2 $3, Hi = $2 mod $3 $1 = Hi $1 = Lo

Laboratorio de Tecnologas de Informacin

Comments 3 operands; exception possible 3 operands; exception possible + constant; exception possible 3 operands; no exceptions 3 operands; no exceptions + constant; no exceptions 64-bit signed product 64-bit unsigned product Lo = quotient, Hi = remainder Unsigned quotient & remainder Used to get copy of Hi Used to get copy of Lo

Arquitectura de Computadoras

ISA- 49

MIPS logical instructions


Instruction and or xor nor and immediate or immediate xor immediate shift left logical shift right logical shift right arithm. shift left logical shift right logical shift right arithm. Example and $1,$2,$3 or $1,$2,$3 xor $1,$2,$3 nor $1,$2,$3 andi $1,$2,10 ori $1,$2,10 xori $1, $2,10 sll $1,$2,10 srl $1,$2,10 sra $1,$2,10 sllv $1,$2,$3 srlv $1,$2, $3 srav $1,$2, $3 Meaning $1 = $2 & $3 $1 = $2 | $3 $1 = $2 $3 $1 = ~($2 |$3) $1 = $2 & 10 $1 = $2 | 10 $1 = ~$2 &~10 $1 = $2 << 10 $1 = $2 >> 10 $1 = $2 >> 10 $1 = $2 << $3 $1 = $2 >> $3 $1 = $2 >> $3

Laboratorio de Tecnologas de Informacin

Comment 3 reg. operands; Logical AND 3 reg. operands; Logical OR 3 reg. operands; Logical XOR 3 reg. operands; Logical NOR Logical AND reg, constant Logical OR reg, constant Logical XOR reg, constant Shift left by constant Shift right by constant Shift right (sign extend) Shift left by variable Shift right by variable Shift right arith. by variable

Arquitectura de Computadoras

ISA- 50

MIPS data transfer instructions


Instruction SW 500(R4), R3 SH 502(R2), R3 SB 41(R3), R2 LW R1, 30(R2) LH R1, 40(R3) LHU R1, 40(R3) LB R1, 40(R3) LBU R1, 40(R3) LUI R1, 40 Comment Store word Store half Store byte Load word Load halfword Load halfword unsigned Load byte Load byte unsigned

Laboratorio de Tecnologas de Informacin

Load Upper Immediate (16 bits shifted left by 16)


LUI R5 R5 0000 0000

Arquitectura de Computadoras

ISA- 51

When does MIPS sign extend?


Examples of sign extending 8 bits to 16 bits: 00001010 00000000 00001010 10001100 11111111 10001100
When is an immediate value sign extended?

Laboratorio de Tecnologas de Informacin

When value is sign extended, copy upper bit to full value:

Arithmetic instructions (add, sub, etc.) sign extend immediates even for the unsigned versions of the instructions! Logical instructions do not sign extend

Load/Store half or byte do sign extend, but unsigned

versions do not.
Arquitectura de Computadoras ISA- 52

Methods of Testing Condition

Laboratorio de Tecnologas de Informacin

Condition Codes Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions.

ex: add r1, r2, r3 bz label


Condition Register Ex: cmp r1, r2, r3

bgt r1, label


Compare and Branch

Ex: bgt r1, r2, label

Arquitectura de Computadoras

ISA- 53

Conditional Branch Distance


Int. Avg. 40% 30% 20% 10% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Bits of Branch Dispalcement 15 FP Avg.

Laboratorio de Tecnologas de Informacin

25% of integer branches are 2 to 4 instructions

Arquitectura de Computadoras

ISA- 54

Conditional Branch Addressing


At least 8 bits suggested (128 instructions)

Laboratorio de Tecnologas de Informacin

PC-relative since most branches are relatively close to the current PC Compare Equal/Not Equal most important for integer programs (86%)

LT/GE GT/LE EQ/NE 0%

7% 40% 7% 23% 86% 37% 50% 100% Int Avg. FP Avg.

Frequency of comparison types in branches Arquitectura de Computadoras

ISA- 55

MIPS Compare and Branch


Compare and Branch
BEQ rs, rt, offset BNE rs, rt, offset BLEZ rs, offset BGTZ rs, offset BLT BGEZ BLTZAL rs, offset BGEZAL

Laboratorio de Tecnologas de Informacin

if R[rs] == R[rt] then PC-relative branch <> if R[rs] <= 0 then PC-relative branch > < >= if R[rs] < 0 then branch and link (into R 31) >=!

Compare to zero and Branch

Remaining set of compare and branch ops take two

instructions Almost all comparisons are against zero!


Arquitectura de Computadoras ISA- 56

MIPS jump, branch, compare instructions


Instruction branch on equal branch on not eq. set on less than set less than imm. set less than uns. set l. t. imm. uns. jump jump register jump and link
Arquitectura de Computadoras

Laboratorio de Tecnologas de Informacin

Example Meaning beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch bne $1,$2,100 if ($1!= $2) go to PC+4+100 Not equal test; PC relative slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; 2s comp. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; 2s comp. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; unsigned numbers sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; unsigned numbers j 10000 go to 10000 Jump to target address jr $31 go to $31 For switch, procedure return jal 10000 $31 = PC + 4; go to 10000 For procedure call

ISA- 57

Signed vs. Unsigned Comparison


R1= 000

Laboratorio de Tecnologas de Informacin

0000 0000 0000 0001two R2= 000 0000 0000 0000 0010 two R3= 111 1111 1111 1111 1111 two After executing these instructions: slt r4,r2,r1 ; if (r2 < r1) r4=1; else r4=0 slt r5,r3,r1 ; if (r3 < r1) r5=1; else r5=0 sltu r6,r2,r1 ; if (r2 < r1) r6=1; else r6=0 sltu r7,r3,r1 ; if (r3 < r1) r7=1; else r7=0
What are values of registers r4 - r7? Why?

r4 =

; r5 =

; r6 =

; r7 =

;
ISA- 58

Arquitectura de Computadoras

Calls: Why Are Stacks So Great?


Stacking of Subroutine Calls & Returns and Environments: A CALL B B: CALL C C: RET A B RET A A B C A B

Laboratorio de Tecnologas de Informacin

A:

Some machines provide a memory stack as part of the architecture (e.g., VAX) Sometimes stacks are implemented via software convention (e.g., MIPS)
Arquitectura de Computadoras ISA- 59

Memory Stacks
Useful for stacked environments/subroutine call & return even if operand stack not part of architecture Stacks that Grow Up vs. Stacks that Grow Down: Next Empty? c b a inf. Big grows up 0 Little

Laboratorio de Tecnologas de Informacin

grows down

Memory Addresses

SP

Last Full?

0 Little How is empty stack represented? Little --> Big/Last Full POP: PUSH: Read from Mem(SP) Decrement SP

inf. Big

Little --> Big/Next Empty POP: PUSH: Decrement SP Read from Mem(SP) Write to Mem(SP) Increment SP

Increment SP Write to Mem(SP) Arquitectura de Computadoras

ISA- 60

Call-Return Linkage: Stack Frames


ARGS Reference args and local variables at fixed (positive) offset from FP

Laboratorio de Tecnologas de Informacin

High Mem

Callee Save Registers (old FP, RA)

Local Variables FP Grows and shrinks during expression evaluation SP Low Mem

Many variations on stacks possible (up/down, last pushed / next ) Compilers normally keep scalar variables in registers, not memory!
Arquitectura de Computadoras ISA- 61

MIPS: Software conventions for Registers


0 1 2 3 4 5 6 7 8 ... 15 t7
Arquitectura de Computadoras

Laboratorio de Tecnologas de Informacin

zero constant 0 at reserved for assembler

16 s0 callee saves . . . (callee must save) 23 s7 24 t8 25 t9 26 k0 reserved for OS kernel 27 k1 28 gp Pointer to global area temporary (contd)

v0 expression evaluation & v1 function results a0 arguments a1 a2 a3 t0 temporary: caller saves (callee can clobber)

29 sp Stack pointer 30 fp 31 ra frame pointer Return Address (HW)


ISA- 62

MIPS / GCC Calling Conventions


fact: addiu sw sw addiu ... sw ... lw lw addiu jr
FP

Laboratorio de Tecnologas de Informacin

$sp, $sp, -32 $ra, 20($sp) $fp, 16($sp) $fp, $sp, 32 $a0, 0($fp) $31, 20($sp) $fp, 16($sp) $sp, $sp, 32 $31

SP ra FP SP ra

low address

ra old FP

FP SP

First four arguments passed in registers.


Arquitectura de Computadoras

ra old FP
ISA- 63

Details of the MIPS instruction set

Laboratorio de Tecnologas de Informacin

Register zero always has the value zero (even if you try to write it) Branch/jump and link put the return addr. PC+4 or 8 into the link

register (R31) (depends on logical vs physical architecture) All instructions change all 32 bits of the destination register (including lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, ) Immediate arithmetic and logical instructions are extended as follows:
logical immediates ops are zero extended to 32 bits arithmetic immediates ops are sign extended to 32 bits (including addu) The data loaded by the instructions lb and lh are extended as follows: lbu, lhu are zero extended lb, lh are sign extended Overflow can occur in these arithmetic and logical instructions: add, sub, addi it cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu Arquitectura de Computadoras ISA- 64

Delayed Branches
li sub bz r3, #7 r4, r4, 1 r4, LL

Laboratorio de Tecnologas de Informacin

addi r5, r3, 1 subi r6, r6, 2 LL: slt r1, r3, r5

In the Raw MIPS, the instruction after the branch is

executed even when the branch is taken?


This is hidden by the assembler for the MIPS virtual machine allows the compiler to better utilize the instruction pipeline (???)
Arquitectura de Computadoras ISA- 65

Branch & Pipelines


Time li r3, #7 sub r4, r4, 1 bz r4, LL addi r5, r3, 1 LL: slt r1, r3, r5 Branch Target execute ifetch execute ifetch execute ifetch Branch execute ifetch

Laboratorio de Tecnologas de Informacin

Delay Slot execute

By the end of Branch instruction, the CPU knows whether or not the branch will take place. However, it will have fetched the next instruction by then, regardless of whether or not a branch will be taken. Why not execute it?
Arquitectura de Computadoras ISA- 66

Filling Delayed Branches


Branch: IF
DEC & OP fetch Execute DEC & OP fetch Execute

Laboratorio de Tecnologas de Informacin

execute successor IF even if branch taken! Then branch target or continue

IF Single delay slot impacts the critical path

Compiler can fill a single delay slot with a useful instruction 50% of the time. try to move down from above jump move up from target, if safe

add r3, r1, r2 sub r4, r4, 1 bz r4, LL NOP ... LL: add rd, ...

Is this violating the ISA abstraction?


Arquitectura de Computadoras ISA- 67

Miscellaneous MIPS I instructions


break

Laboratorio de Tecnologas de Informacin

control syscall
coprocessor instrs. TLB instructions restore from exception

kernel/user load word left/right store word left/right

A breakpoint trap occurs, transfers to exception handler A system trap occurs, transfers control to exception handler Support for floating point Support for virtual memory: discussed later Restores previous interrupt mask & mode bits into status register Supports misaligned word loads Supports misaligned word stores

Arquitectura de Computadoras

ISA- 68

MIPS: Instruction Set Format


load/store architecture with 3 explicit operands (ALU ops) fixed 32-bit instructions 3 instruction formats
R-Type I-Type J-Type

Laboratorio de Tecnologas de Informacin

6 instruction set groups:


load/store - data movement operations computational - arithmetic, logical, and shift operations jump/branch - including call and returns coprocessor - FP instructions coprocessor0 - memory management and exception handling special - accessing special registers, system calls, breakpoint instructions, etc.
ISA- 69

Arquitectura de Computadoras

R2000/3000 Instruction Formats


R-type (register)

Laboratorio de Tecnologas de Informacin

e.g. add $8, $17, $18

# $8 = $17 + $18

31

26 25

21 20

16 15

11 10

6 5

OpCode

rs

rt

rd

shamt

funct

31

26 25

21 20

16 15

11 10

6 5

17

18

32

Arquitectura de Computadoras

ISA- 70

R2000/3000 Instruction Formats


I-type (immediate) e.g. addi $8, $17, -44 lw $8, -44($17) beq $17, $8, label
31 26 25 21 20 16 15

Laboratorio de Tecnologas de Informacin

# $8 = $17 -44 # $8 = M[$17 - 44] # if( $8 == $17) go to label:


0

OpCode
31 26 25

rs
21 20

rt
16 15

immediate
0

op

17

-44

Arquitectura de Computadoras

ISA- 71

R2000/3000 Instruction Formats

Laboratorio de Tecnologas de Informacin

J-type (jump) e.g. jump label

# call label: ; $31 = $pc + 8


0

31

26 25

OpCode

target

31

26 25

-44

Arquitectura de Computadoras

ISA- 72

Bhandarkar and Clark: RISC vs. CISC

Laboratorio de Tecnologas de Informacin

Compares the VAX 8700 vs MIPS M/2000 (R3000 chip) Combines three fractors:
Architecture Implementation Compilers and OS

Argues that:
Implementation effects are second order Compilers are similar RISCs are better than CISCs

Is it a fair comparison of RISCs vs CISCs ?


Arquitectura de Computadoras ISA- 73

Bhandarkar and Clark, cont.


RISC factor
Risc Factor = CPIvax Instrvax CPImips Instrmips

Laboratorio de Tecnologas de Informacin

Bechmark spice2g6 matrix300 nasa7 fpppp tomcatv dudoc espresso eqntott li geo. mean
Arquitectura de Computadoras

Inst. Ratio 2.5 2.4 2.1 2.9 2.9 2.7 1.7 1.1 1.6 2.2

CPI CPI Ratio RISC MIPS VAX factor 1.8 8.0 4.4 1.8 3.0 13.8 4.5 1.9 3.0 15.0 5.0 2.4 1.5 15.2 10.5 2.7 2.1 17.5 8.2 2.9 1.7 13.2 7.9 3.0 1.1 5.4 5.1 3.0 1.3 4.4 3.5 3.3 1.1 6.5 6.0 3.7 1.7 9.9 5.8 2.7
ISA- 74

Bhandarkar and Clark, cont.


Compensation Factors
Increase VAX CPI but decrease VAX instruction count Increase MIPS instruction count Example 1: Loads and stores vs. operand specifiers Example 2: Necessary complex operations, e.g. loop branches

Laboratorio de Tecnologas de Informacin

Factors favoring VAX


Big immediate values Not-taken branches incur no delay

Arquitectura de Computadoras

ISA- 75

Bhandarkar and Clark, cont.


Factors favoring MIPS
Operand specifier decoding Number of registers Separating floating point unit Simple jumps and branches (lower latency) Fancy VAX instructions: Unnecessary functionality Instruction scheduling Translation buffer Branch displacement size

Laboratorio de Tecnologas de Informacin

Arquitectura de Computadoras

ISA- 76

Homework Assignment 3

Laboratorio de Tecnologas de Informacin

Make a review of the ISA for a specific processor Write a three page report explaining the following: 1. Kind of IS Architecture (RISC, CISC or other, 16-bit, 322. 3. 4. 5. 6.

bit, 64-bit) Classes of instructions (ALU, Memory Movement, Branches, etc.) Addressing Modes (immediate, base+displacemente, indirect, etc.) Displacements in branches and control flow instructions (call, ret) Special instructions: system calls, traps, access to special purpose registers Instructions formats

Arquitectura de Computadoras

ISA- 77

Hw3: List of Processors

Laboratorio de Tecnologas de Informacin

1 Claudia Mndez Garza 3 Vctor Echeverra Ros

Xscale o ARM Texas Instruments TMS320DM64x o un DSP

2 Jos Alberto Ramrez Uresti Opteron AMD 64 bits

Due date: September 26th, 2008.

Arquitectura de Computadoras

ISA- 78

Laboratorio de Tecnologas de Informacin

Lminas complementarias no expuestas en la clase que pueden servir de soporte

Arquitectura de Computadoras

ISA- 79

VAX-11
Introduced by DEC in 1977: VAX

Laboratorio de Tecnologas de Informacin

11/780 Upward compatible from PDP-11 32-bit word and addresses Virtual memory is first-class 16 GPRs (r15 is PC, r14 is SP), CCs Extremely orthogonal, memorymemory Decode as byte stream
Opcode: operation, number of operands & operand type Variable-length address specifiers

Arquitectura de Computadoras

ISA- 80

VAX-11, cont.
Data types
8-, 16-, 32-, 64-, 128-bit integers F (32 bits), D (64), G (64), H (128) FP Character string (8-bits/char) Decimal (4-bits/digit) Numeric string (8-bits/digit)

Laboratorio de Tecnologas de Informacin

Addresing modes include


Literal (6 bits) 8-, 16-, 32-bit immediates Register, register deferred 8-, 16-, 32-bit displacements 8-, 16-, 32-bit displacement deferred
Arquitectura de Computadoras

Indexed Autoincrement Autodecrement Autoincrement deferred

ISA- 81

VAX-11, cont.
Operations
Data Transfers (including string move) Arithmetic and Logical (2 and 3 operands) Control (Branch, Jump, etc. )

Laboratorio de Tecnologas de Informacin

AOBLEQ (Add one and Branch if Less than or EQual)

Procedure (CALLs save state) Bit Manipulation Floating Poing (Add/Sub/Mult/Divide) POLYF -- Polynomial Evaluation System (Exception, VM) Other
CRC -- Cycle redundant chech INSQUE -- Insert entry in queue

Arquitectura de Computadoras

ISA- 82

VAX-11, cont.
8 bits
New instruction: opcode calls for three operands Specifier 1: four bits + register

Laboratorio de Tecnologas de Informacin

PC+0 PC+1

VAX has too many modes & formats Serial semantics limit parallel execution
The big deal with RISC is not REDUCED number of instructions; its few modes & formats to facilitate pipelining
ISA- 83

+ four byte displacements

PC+6

Specifier 2 + two byte displacements

PC+9

Specifier 3: index Specifier 3: indexed mode + four byte displacement

PC+15

next instruction

Arquitectura de Computadoras

DEC Alpha
Introduced by DEC in 1992
Ability to emulate VAX instructions important Strongly influenced by Cray-1

Laboratorio de Tecnologas de Informacin

64-bit architecture Load/Store -- only displacement

addressing Standard datatypes


No byte loads/stores

Registers
32 64-bit GPRs (r31 = 0) 32 64-bit FPRs

VAX and IEEE floating point


Arquitectura de Computadoras ISA- 84

DEC Alpha, cont.


Four fixed-length instruction formats
Sub-formats for computation instructions 32-bit instructions Designed with multiple-issue in mind No delayed branches

Laboratorio de Tecnologas de Informacin

Precise exceptions not automatic PAL code

Arquitectura de Computadoras

ISA- 85

DEC Alpha Instructions Formats


Memory Format
31 26 25 21 20 16 15 0

Laboratorio de Tecnologas de Informacin

OpCode src/dest PC-Relative Format


31 26 25 21 20

base

displacement

OpCode PAL-call Format


31 26 25

src

displacement

OpCode

PAL argument

Arquitectura de Computadoras

ISA- 86

DEC ALPHA Instruction Formats, cont.


Three-Register Integer Format
31 26 25 21 20 16 15 12 11 5 4 0

Laboratorio de Tecnologas de Informacin

OpCode

src1

src2

000 0

function

dest

Eight-bit Immediate Integer Format


31 26 25 21 20 16 15 12 11 5 4 0

OpCode

src1

const

function

dest

Eight-bit Immediate Integer Format


31 26 25 21 20 16 15 5 4 0

OpCode

src1

src2

function

dest

Arquitectura de Computadoras

ISA- 87

DEC Alpha Instruction Set


Operate Instructions
Integer Arithmetic Logical (AND, OR, conditional MOV) Byte-manipulation Floating-point arithmetic Miscellaneous (memory prefetching, trap and memory barriers

Laboratorio de Tecnologas de Informacin

Load/Store Instructions
Load/Store Quadwords (64-bits) Load-Linked/Store Conditional (for MP synchronization)
Arquitectura de Computadoras ISA- 88

DEC Alpha Instruction Set, cont.


Control/Branching Instruction
Branch on condition (8 conditions) in integer register Branch on condition (6 conditions) in FP register Unconditional branches Calculated jumps

Laboratorio de Tecnologas de Informacin

Branch Hints
Different hint/rule type of branch

Supervision Instructions
PAL code for needed task
Arquitectura de Computadoras ISA- 89

You might also like