CompOrg Ch2

Chap.
2 Instructions: Language
of the Computer
Prof. W.-G. Teng, "Computer Organization & Assembly Language", NCKU ES 2
Review:
Execution Cycle
Processor
Control
Datapath
Memory
Devices
Input
Output
contents Reg #4 ADD contents Reg #2
results put in Reg #2
The datapath executes the instructions as
directed by control
000000 00100 00010 0001000000100000
Memory stores both instructions and data
Fetch
Decode Exec
Assembly Language Instructions
Language of the machine
More primitive than higher level languages
Very restrictive (e.g., MIPS arithmetic instructions)
Stored-programconcept
Instructions and data of many types can be stored in
memory as binary numbers
Well be working with the MIPS instruction set
architecture
similar to other architectures developed since the 1980's
used by NEC, Nintendo, Silicon Graphics, Sony,
Design goals
maximize performance, minimize cost, reduce design time
minimize memory space (embedded systems)
minimize power consumption (mobile systems)
Sec. 2.1 Introduction
MIPS Arithmetic Instruction
MIPS assembly language arithmetic statement
add $t0, $s1, $s2 # $t0 = $s1 + $s2
sub $t0, $s1, $s2 # $t0 = $s1 - $s2
Each arithmetic instruction
Performs only one operation
Specifies exactly three operands
Meaning: (destination source1 op source2)
Operand order is fixed (destination first)
Design Principle 1: Simplicity favors regularity
Sec. 2.2 Operations of the Computer Hardware
Compiling More Complex Statements
Assuming variables f, g, h, i and j are assigned to
the registers $s0, $s1, $s2, $s3 and $s4,
respectively. What is the compiled MIPS code to the C
statement?
f = (g + h) - (i + j)
add $t0, $s1, $s2
add $t1, $s3, $s4
sub $s0, $t0, $t1
Registers
Bricks of the CPU
Visible to hardware and to programmer
High-speed storage for operands
Easy to name
Also used for addressing memory
Not all registers are equal
Some are for special purpose (e.g., $0 in MIPS is hardwired
to 0)
Some are used for integer and some for floating-point
Some are restricted by convention (i.e., $s0, $s1, ... for
variables and $t0, $t1, ... for temporary registers)
Sec. 2.3 Operands of the Computer Hardware
Registers (contd)
Most current computers have 32 or 64 registers
Why no more than 32 or 64?
Well... Sometimes there is (e.g., SPARC, CRAY)
Need to save registers
Memory is a hierarchy of devices with faster and more
expensive ones closer to CPU
Design Principle 2: Smaller is faster
1,000,000 10s of ms. 1G to 100G Secondary Memory
10-100 10s to 100s ns. 10M to 1G Primary Memory
1-2
5-10
Nanoseconds
10s of ns.
8k on-chip
1M off-chip
Cache
1 Nanoseconds 100s to 1000s Register
Relative Time Access Time Capacity (bytes) Memory Level
Registers vs. Memory
Arithmetic instructions operands must be registers
Compiler associates variables with registers
What about programs with lots of variables?
Processor
Control
Datapath
Memory
Devices
Input
Output
Accessing Memory
MIPS has two basic data transfer instructions for accessing
memory
lw $t0, 4($s3) # load word from memory
sw $t0, 8($s3) # store word to memory
The data transfer instruction must specify
memory address: where in memory to read from (load) or write to
(store)
register destination (source): where in the register file to write to
(load) or read from (store)
The memory address is formed by summing the constant portion
of the instruction and the contents of the second register (i.e.,
offset + base address)
Information Units
Basic unit is the bit (stores a 0 or a 1)
1 byte = 8 bits
1 word = 4 bytes
Memory is an array of information units
Each unit has the same size
Each unit has a unique address
Address and contents are different (obvious!)
1 0
101 1
10 2
100 3
Address Data
Addressing Words
In MIPS, words must start at addresses that are
multiples of 4 (i.e., alignment restriction)
0
4
8
12
Address Words
Example of Accessing Memory
Assume variable h is associated with register $s2 and
the base register of array A is $s3. Compile the C
statement: A[12] = h + A[8]
lw $t0, 32($s3)
add $t0, $s2, $t0
sw $t0, 48($s3)
Constant Operand
To add the constant 4 to register $s3 (assuming that
the constants are placed in memory already)
lw $t0, AddrConstant4($s1)
add $s3, $s3, $t0
Instruction of add immediate
addi $s3, $s3, 4
Since MIPS supports negative constants, there is no
need for subtract immediate
Design Principle 3: Make the common case fast
Unsigned Binary Integers
Given an n-bit number
Range: 0 to +2
n
1
Example
0000 0000 0000 0000 0000 0000 0000 1011
2
= 0 + + 12
3
+ 02
2
+12
1
+12
0
= 0 + + 8 + 0 + 2 + 1 = 11
10
Using 32 bits
0 to +4,294,967,295
0
0
1
1
2 n
2 n
1 n
1 n
2 x 2 x 2 x 2 x x + + + + =

Sec. 2.4 Signed and Unsigned Numbers
Twos Complement Signed Integers
Given an n-bit number
Range: 2
n 1
to +2
n 1
1
Example
1111 1111 1111 1111 1111 1111 1111 1100
2
= 12
31
+ 12
30
+ + 12
2
+02
1
+02
0
= 2,147,483,648 + 2,147,483,644 = 4
10
Using 32 bits
2,147,483,648 to +2,147,483,647
0
0
1
1
2 n
2 n
1 n
1 n
2 x 2 x 2 x 2 x x + + + + =

Twos Complement Representation
To represent both positive and negative numbers
0000 0000 0000 0000 0000 0000 0000 0000
two
= 0
ten
0000 0000 0000 0000 0000 0000 0000 0001
two
= 1
ten
...
0111 1111 1111 1111 1111 1111 1111 1111
two
= 2,147,483,647
ten
1000 0000 0000 0000 0000 0000 0000 0000
two
= -2,147,483,648
ten
1000 0000 0000 0000 0000 0000 0000 0001
two
= -2,147,483,647
ten
...
1111 1111 1111 1111 1111 1111 1111 1111
two
= -1
ten
MSB is the sign bit
Leading 0 means positive or zero
Leading 1 means negative
Most
positive
Most
negative
Signed Negation
Complement and add 1
Complement means 1 0, 0 1
Example: negate +2
+2 = 0000 0000 0010
2
2 = 1111 1111 1101
2
+ 1
= 1111 1111 1110
2
x 1 x
1 1111...111 x x
2
= +
= = +
(1) invert the bits
(2) add one
Sign Extension Shortcut
Convert 16-bit binary versions of 2
ten
and -2
ten
to 32-
bit binary numbers
0000 0000 0000 0010
two
= 2
ten
(16-bit binary version)
0000 0000 0000 0000 0000 0000 0000 0010
two
= 2
ten
1111 1111 1111 1110
two
= -2
ten
1111 1111 1111 1111 1111 1111 1111 1110
two
= -2
ten
Twos complement: the unsigned sum of an n-bit
number and its negative is 2
n
(i.e., x and 2
n
-x)
Translating into a Machine
Instruction
Given the MIPS instruction:
add $t0, $s1, $s2
The decimal representation is
The binary representation is
32 0 8 18 17 0
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
100000 00000 01000 10010 10001 000000
$s1 $s2 $t0
Sec. 2.5 Representing Instructions in the Computer
Binary-to-Hexdecimal and Back
Convert the following numbers into the other
base:
eca8 6420
hex
0001 0011 0101 0111 1001 1011 1101 1111
two
1110 1100 1010 1000 0110 0100 0010 0000
two
...
1357 9bdf
hex
...
MIPS Fields
op: basic operation of the instruction (i.e., the opcode)
rs: the first register source operand
rt: the second register source operand
rd: the register destination operand
shamt: shift amount (will be discussed later)
funct: selects the specific variant of the operation in the op field
funct shamt rd rt rs op
MIPS Fields (contd)
All MIPS instructions are 32 bits long
Problem occurs if all instructions are designed to fit
this instruction format
e.g., the constant cannot exceed 32 if the 5-bit field is used
for load word instruction
Design Principle 4: Good design demands good
compromises
Design Principles
1. Simplicity favors regularity
2. Smaller is faster
3. Make the common case fast
4. Good design demands good compromises
Instruction Types
R-type or R-format
I-type or I-format
16 bits 5 bits 5 bits 6 bits
constant or address rt rs op
MIPS Instruction Encoding
address n.a. n.a. n.a. reg reg 43
ten
I sw
address n.a. n.a. n.a. reg reg 35
ten
I lw
constant n.a. n.a. n.a. reg reg 8
ten
I addi
n.a. 34
ten
0 reg reg reg 0 R sub
n.a. 32
ten
0 reg reg reg 0 R add
address funct shamt rd rt rs op format instruction
Example of Instruction Encoding
The statement:
A[300] = h + A[300]
is compiled into
lw $t0, 1200($t1)
add $t0, $s2, $t0
sw $t0, 1200($t1)
1200 8 9 43
32 0 8 8 18 0
1200 8 9 35
funct address /
shamt
rd rt rs op
List of registers
Example of Instruction Encoding
(contd)
0000 0100 1011 0000 01000 01001 101011
100000 00000 01000 01000 10010 000000
0000 0100 1011 0000 01000 01001 100011
funct address /
shamt
rd rt rs op
1200 8 9 43
32 0 8 8 18 0
1200 8 9 35
funct address /
shamt
rd rt rs op
Logical Instructions
nor ~ ~ Bit-by-bit NOT
or, ori | | Bit-by-bit OR
and, andi & & Bit-by-bit AND
srl >>> >> Shift right
sll << << Shift left
MIPS
instructions
J ava operators C operators Logical
operations
Sec. 2.6 Logical Operations
Example of Shifts
Shift left logic (sll)
sll $t2, $s0, 4
(before) 0000 0000 0000 1001
two
= 9
ten
(after) 0000 0000 1001 0000
two
= 144
ten
Shifting left by i bits gives the same result as
multiplying by 2
i
0 4 10 16 0 0
Examples of AND/OR/NOR
$t1: 0011 1100 0000 0000
two
$t2: 0000 1101 0000 0000
two
and $t0, $t1, $t2
or $t0, $t1, $t2
nor $t0, $t1, $t2
$t0: 0000 1100 0000 0000
two
$t0: 0011 1101 0000 0000
two
$t0: 1100 0010 1111 1111
two
Example of NOT
There is no MIPS instruction for NOT logic
But... A NOR 0 = NOT(A OR 0) = NOT(A)
$t1: 0011 1100 0000 0000
two
nor $t0, $t1, $0
MIPS also has andi and ori, but no nori
$t0: 1100 0011 1111 1111
two
Instructions for Making Decisions
Conditional branch
Branch if equal
beq register1, register2, L1
Branch if not equal
bne register1, register2, L1
Unconditional branch
J ump
j Exit # go to Exit
Sec. 2.7 Instructions for Making Decisions
Compiling if-then-else
Compile the following C statements:
if(i==j) f=g+h;
else f=g-h;
i==j?
f=g+h f=g-h
i=j ij
Else:
Exit:
Compiling if-then-else (contd)
bne $s3, $s4, Else
add $s0, $s1, $s2
j Exit
Else: sub $s0, $s1, $s2
Exit:
i==j?
f=g+h f=g-h
i=j ij
Else:
Exit:
f, g, h, i, j: $s0 ~ $s4
Compiling while Loop
while(save[i]==k)
i++;
MIPS assembly codes:
Loop: sll $t1, $s3, 2
add $t1, $t1, $s6
lw $t0, 0($t1)
bne $t0, $s5, Exit
addi $s3, $s3, 1
j Loop
Exit:
i: $s3 k: $s5 save: $s6
Instructions of set on less than
Register $t0 is set to 1 if the value of $s3 is
less than that of $s4 (otherwise 0)
slt $t0, $s3, $s4
Immediate version
slti $t0, $s2, 10
Example usage
C statement: if(n<1) ...
MIPS assembly codes
slti $t0, $a0, 1
bne $t0, $zero, IF
Extending the Usage of slt
When an inequity is to be evaluated, there
are four possible cases:
slt x, a, b
if(a < b) then x=1
if(a b) then x=0
slt x, b, a
if(b < a) then x=1
if(b a) then x=0
Supporting Procedures
Registers for Procedure Calling
$a0~$a3: to pass parameters
$v0~$v1: to return values
$ra: to return to the point of origin
jump-and-link instruction
jal ProcedureAddress
jump register instruction
jr $ra
Sec. 2.8 Supporting Procedures in Computer Hardware
Flows of Calling a Procedure
Caller-callee pair
Caller puts the parameter values in $a0~$a3
Caller uses jal X to jump to procedure X (i.e.,
the callee)
Callee performs the calculations
Callee places the results in $v0~$v1
Callee returns control to the caller using jr $ra
Using Memory to Preserve Registers
Since the number of registers is limited, when
performing procedure calls, the contents of registers
have to be preserved in the memory
32 Registers Memory
sw
lw
Using More Registers
Stack: a last-in-first-out queue
A stack pointer is needed for the most
recently allocated address
Push: placing data onto the stack
Pop: removing data from the stack
$sp in MIPS: stacks grow from higher
addresses to lower addresses
Push: subtracting from $sp
Pop: adding to $sp
Example of a Procedure
int leaf_example (int g, int h, int i, int j){
int f;
f = (g+h)-(i+j);
return f;
}
leaf_example: addi $sp, $sp, -12
sw $t1, 8($sp)
sw $t0, 4($sp)
sw $s0, 0($sp)
add $t0, $a0, $a1
add $t1, $a2, $a3
sub $s0, $t0, $t1
...
g, h, i, j: $a0~$a3 f: $s0
Stack for the Previous Example
$sp always points to the top (i.e., the last
word) of the stack
$sp
$sp
$sp
High address
Low address
(a) before (b) during (c) after
$t1
$t0
$s0
Example of a Procedure (contd)
leaf_example: addi $sp, $sp, -12
sw $t1, 8($sp)
sw $t0, 4($sp)
sw $s0, 0($sp)
add $t0, $a0, $a1
add $t1, $a2, $a3
sub $s0, $t0, $t1
add $v0, $s0, $zero
lw $s0, 0($sp)
lw $t0, 4($sp)
lw $t1, 8($sp)
addi $sp, $sp, 12
jr $ra
Saving & Restoring Registers
$t0~$t9
10 temporary registers that are not
preserved by the callee on a procedure call
$s0~$s7
8 saved registers that must be preserved
on a procedure call
Nested Procedure
Compile the following (recursive) C statements:
int fact (int n){
if(n<1) return 1;
else return(n*fact(n-1));
}
fact: addi $sp, $sp, -8
sw $ra, 4($sp)
sw $a0, 0($sp)
slti $t0, $a0, 1
beq $t0, $zero, L1
addi $v0, $zero, 1
addi $sp, $sp, 8
jr $ra
...
Nested Procedure (contd)
fact: addi $sp, $sp, -8
sw $ra, 4($sp)
sw $a0, 0($sp)
slti $t0, $a0, 1 #if(n<1) $t0=1
beq $t0, $zero, L1 #if(n>=1) go to L1
addi $v0, $zero, 1 #return 1
addi $sp, $sp, 8
jr $ra
L1: addi $a0, $a0, -1 #n>=1: $a0=n-1
jal fact
lw $a0, 0($sp)
lw $ra, 4($sp)
addi $sp, $sp, 8
mul $v0, $a0, $v0 #return n*fact(n-1)
jr $ra
What is Preserved across a Call?
A C variable depends on its type and storage class
Types: integer and characters
Storage classes: automatic and static
To simplify access to static data, MIPS reserves the global pointer
register, i.e., $gp
Preservation of $sp (critical!)
Callee adds exactly the same amount that was subtracted from it
Stack below the stack pointer Stack above the stack pointer
Return value registers: $v0~$v1 Return address register: $ra
Argument registers: $a0~$a3 Stack pointer register: $sp
Temporary registers: $t0~$t9 Saved registers: $s0~$s7
Not preserved Preserved
Allocating Space for New Data on the
Stack
Procedure frame (or activation record)
Segment of the stack containing a procedures
saved registers and local variables that do not fit
in registers (e.g., local arrays or structures)
Frame pointer ($fp)
$sp might change during the procedure, and so
references to a local variable might have different
offsets (i.e., making the procedure harder to
understand)
Alternatively, $fp offers a stable base register
Stack (contd)
$fp points to the first word of the frame and $sp
points to the top of the stack
$sp
$sp
High address
Low address
(a) before (b) during (c) after
Saved argument
registers (if any)
$fp
$sp
$fp
Saved return addr.
Saved saved
registers (if any)
Local arrays &
structures (if any)
$fp
If there are
more than 4
parameters
Heap
Text segment: MIPS machine
code
Static data segment: constants
and other static variables
Heap: dynamic data structures
malloc() and free()
new and delete
Problems of memory leak &
dangling pointers
Stack and heap grow toward
each other, thereby allowing
using memory efficiently
Reserved
Text
Static data
Stack
Dynamic data
$sp 7fff fffc
hex
$gp 1000 8000
hex
PC 0040 0000
hex
1000 0000
hex
0
Using the 16-bit
offset to access
data easily
1000 ffff
hex
MIPS Register Conventions
yes return address 31 $ra
yes frame pointer 30 $fp
yes stack pointer 29 $sp
yes global pointer 28 $gp
no more temporaries 24~25 $t8~$t9
yes saved 16~23 $s0~$s7
no temporaries 8~15 $t0~$t7
no arguments 4~7 $a0~$a3
no
values for results &
expression evaluation
2~3 $v0~$v1
n.a. the constant value 0 0 $zero
Preserved on call? Usage Register number Name
Register 1 ($at) is reserved for the
assembler and registers 26~27
($k0~$k1) are reserved for the OS
Comparison of J ump Instructions
For procedure call $ra=PC+4; go to L
jal L
jump and link
For procedure return go to $ra
jr $ra
jump register
J ump to targeted address go to L
j L
jump
Comments Meaning Example Instruction
PC (program counter): the register containing the
address of the instruction in the program being
executed (i.e., instruction address register)
jal instruction saves PC+4 in $ra to link to the
following instruction to set up the procedure return
Table: Unconditional J ump Instructions in MIPS
Processing Text
Use 8 bits to represent characters
ASCII (American Standard Code for Information Interchange)
Extracting a byte from a word
lb $t0, 0($sp) #Read byte from source
sb $t0, 0($gp) #Write byte to destination
Handling the rightmost 8 bits of a register
String: with a variable number of characters
First position of a string indicates the length
An accompanying variable has the string length
Last position is indicated by a special character to mark the
end of a string
C uses null (i.e., ASCII 0) as the terminator
Cal => 67, 97, 108, 0
Sec. 2.9 Communicating with People
String Copy Procedure
void strcpy(char x[], char y[]){
int i = 0;
while((x[i]=y[i]) != \0) /* copy & test byte)
i += 1;
}
strcpy: addi $sp, $sp, -4
sw $s0, 0($sp)
add $s0, $zero, $zero
L1: add $t1, $s0, $a1
lb $t2, 0($t1)
add $t3, $s0, $a0
sb $t2, 0($t3)
beq $t2, $zero, L2
addi $s0, $s0, 1
j L1
L2: lw $s0, 0($sp)
addi $sp, $sp, 4
jr $ra
X[]: $a0 y[]: $a1 i: $s0
We dont have to
multiply i by 4 since y
is an array of bytes
Using $t0 for i
can avoid saving
& restoring $s0
Characters & Strings in J ava
Unicode: an universal encoding of the encodings of
most human languages
J ava uses Unicode for characters
16 bits are used for representing a character
UTF-16 is the default encoding
UTF-8 keeps the ASCII subset as 8 bits and uses 16~32 bits
for the other characters (i.e., a variable-length encoding)
MIPS instructions for 16-bit halfwords
lh $t0, 0($sp) #Read halfword from source
sh $t0, 0($gp) #Write halfword to destination
Unlike C, J ava includes a word that gives the length
of the string
32-Bit Immediate Operands
Although constants are frequently short and fit into
the 16-bit field, sometimes they are bigger
lui (load upper immediate) instruction can set the
upper 16 bits of a constant in a register
Assembler must have a temporary register available
to create the long values, i.e., the reason for $at
Sec. 2.10 MIPS Addressing for 32-Bit Immediates and Addresses
0000 0000 1111 1111 01000 00000 001111
The machine language version of lui $t0, 255:
0000 0000 0000 0000 0000 0000 1111 1111
Contents of $t0 then becomes:
Loading a 32-Bit Constant
What is the assembly code to load this 32-bit
constant into register $s0?
0000 0000 0011 1101 0000 1001 0000 0000
Load the upper 16 bits, then add the lower 16 bits
lui $s0, 61
$s0: 0000 0000 0011 1101 0000 0000 0000 0000
ori $s0, $s0, 2304
$s0: 0000 0000 0011 1101 0000 1001 0000 0000
Can we use addi instruction instead?
No. addi copies the leftmost 16 bits of the instruction into
the upper 16 bits of a word while ori loads 0s into the
upper 16 bits
Addressing in Branches and J umps
J -type instruction
j 10000
Conditional branch instruction
bne $s0, $s1, Exit
If addresses of the program had to fit in this 16-bit field, it
would mean that no program could be bigger than 2
16
(far too
small in practice!)
16 bits 5 bits 5 bits 6 bits
Exit 17 16 5
26 bits 6 bits
10000 2
PC-Relative Addressing
For branching instructions,
PC = register (i.e., PC+4) + branch address
This allows the program to be as large as 2
32
Which register? PC is the ideal choice
Since conditional branches are found in loops and in if statements,
they tend to branch to a nearby instruction
Branching within 2
15
words of the current instruction
MIPS address is actually relative to the following instruction (PC+4)
as opposed to the current instruction (PC)
However, for J -type instructions (i.e., jump and jump-and-link),
long addresses are used
Since nearby criterion may not be true
Using 26-bit field (+word unit) to represent 28-bit byte address
Branch Offset in Machine Language
Give machine codes for the following MIPS assembly codes (i.e., the while loop):
Loop: sll $t1, $s3, 2
add $t1, $t1, $s6
lw $t0, 0($t1)
bne $t0, $s5, Exit
addi $s3, $s3, 1
j Loop
Exit:
while(save[i]==k)
i++;
20000 2 80020
1 19 19 8 80016
... 80024
2 21 8 5 80012
0 8 9 35 80008
32 0
9 22 9 0 80004
0 4
9 19 0 0 80000
Branching Far Away
Problem: occasionally the conditional branch is to a
location far away rather than a nearby location
Solution for the assembler:
Insert an unconditional jump to the branch target
Invert the condition so that the branch decides whether to
skip the jump
Example: (1) & (2) are logically identical
(1) beq $s0, $s1, L1
(2) bne $s0, $s1, L2
j L1
L2:
MIPS Addressing Mode
Multiple forms of MIPS addressing mode
Register addressing: operand is a register
Base (or displacement) addressing: operand is at
the memory location whose address is the sum of
a register and a constant in the instruction
Immediate addressing: operand is a constant
PC-relative addressing: address is the sum of the
PC and a constant in the instruction
Pseudodirect addressing: jump address is the 26
bits concatenated with the upper bits of the PC
MIPS Addressing Mode (contd)
1. Immediate addressing (e.g., addi $s1, $s2, 100)
immediate rt rs op
2. Register addressing (e.g., add $s1, $s2, $s3)
funct ... rd rt rs op
Register
Registers
3. Base addressing (e.g., lw $s1, 100($s2))
address rt rs op
Register + Word
Memory
Halfword Byte
MIPS Addressing Mode (contd)
4. PC-relative addressing (e.g., beq $s1, $s2, 100)
5. Pseudodirect addressing (e.g., j 10000)
address op
PC : Word
Memory
address rt rs op
PC+4 + Word
Memory
<<2
<<2
Decoding Machine Codes
Reverse engineering: machine codes assembly
instructions
Example: what is the MIPS instruction corresponding
to 00af8020
hex
?
Convert hexadecimal to binary to find the op field
0000 0000 1010 1111 1000 0000 0010 0000
two
op rs rt rd shamt funct
000000 00101 01111 10000 00000 100000
two
add $s0, $a1, $t7
A Translation Hierarchy for C
Sec. 2.12 Translating and Starting a Program
Many compilers produce
object modules directly
Static linking
Compiler
Transforms the C program into an assembly
language program
High-level-language programs take fewer lines of
codes, so programmer productivity is much higher
In 1975, many operating systems and assemblers
were written in assembly language because
Memory space was small
Compilers were inefficient
Assembler
Pseudoinstructions: since assembly language is the
interface to higher-level software, the assembler can
also handle common variations of machine language
instructions
Hardware needs not implement such instructions
These instructions simplifies translation and programming
Examples:
(1) move $t0, $t1 add $t0, $zero, $t1
(2) blt (branch on less than) slt and bne
(3) bgt, bge and ble
The only cost is reserving $at for use by the assembler
Assembler (contd)
Assembler turns the assembly language program into
an object file
A combination of machine language instructions, data and
information needed for memory mapping
Object file for UNIX typically contains 6 pieces
(1) file header (2) text segment (3) static data segment
(4) relocation info: identifies instructions and data words that
depend on absolute memory addresses
(5) symbol table: matches label names to memory addresses
(6) debugging info: descriptions of how the module were
compiled, so that a debugger can associate machine
instructions with C source files
Linker
Does a single change to one line of the codes require
compiling and assembling the whole program?
No, each procedure can be compiled and assembled
independently use linker (or link editor) to stitch them
Linker finds the old addresses and replaces them
with the new addresses by using relocation info and
symbol table
Place code and data modules symbolically in memory
Determine the addresses of data and instruction labels
Patch both the internal and external references (i.e.,
branches, jumps and data accesses)
Linker (contd)
Linker produces an executable file
The same format as an object file, except that it
contains no unresolved references, relocation info,
symbol table or debugging info
Partially linked files are also possible
E.g., library routines
Unresolved addresses may still exist and hence result in
object files
Example: Linking Object Files
- B
- X
Address Label Symbol table
B jal 4
X lw 0
Dependency Instruction type Address Relocation info
... ...
(X) 0 Data segment
... ...
jal 0 4
lw$a0, 0($gp) 0
Instruction Address Text segment
Procedure A
- A
- Y
Address Label
A jal 4
Y sw 0
Dependency Instruction type Address
... ...
(Y) 0
... ...
jal 0 4
sw$a1, 0($gp) 0
Instruction Address
Procedure B
text size=200
hex
data size=30
hex
text size=100
hex
data size=20
hex
Place 0 here since
address of B is not
determined yet
Recall:
Allocating Space on the Heap
Text segment: MIPS machine
code
Static data segment: constants
and other static variables
Heap: dynamic data structures
malloc() and free()
new and delete
Problems of memory leak &
dangling pointers
Stack and heap grow toward
each other, thereby allowing
using memory efficiently
Reserved
Text
Static data
Stack
Dynamic data
$sp 7fff fffc
hex
$gp 1000 8000
hex
PC 0040 0000
hex
1000 0000
hex
0
Using the 16-bit
offset to access
data easily
Example: Linking Object Files (contd)
... ...
(Y) 1000 0020
hex
... ...
(X) 1000 0000
hex
Address Data segment
... ...
jal 40 0000
hex
0040 0104
hex
sw$a1, 8020
hex
($gp) 0040 0100
hex
... ...
jal 40 0100
hex
0040 0004
hex
lw$a0, 8000
hex
($gp) 0040 0000
hex
Instruction Address Text segment
Executable file
text size=300
hex
data size=50
hex
Offset 8000
hex
is
due to the 16-bit
2s complement
arithmetic
Loader
Operating system reads the executable file from disk
to memory and starts it
Reads the file header to determine the size of both text and
data segments
Creates an address space which is large enough
Copies the instructions and data into memory
Copies the parameters (if any) to the main program onto the
stack
Initializes the machine registers and sets $sp to the first
free location
J umps to the start-up routine. When the main routine
returns, the program is terminated with an exit system call
Dynamically Linked Libraries
Static approach: link libraries before a program is run
Advantage: the fastest way to call library routines
Disadvantages: library routines become parts of the
executable code
New updated version of libraries cannot be incorporated into
static linked programs
Library can waste the memory size (e.g., standard C library is
2.5 MB)
Dynamically linked libraries (DLLs)
Library routines are not linked and loaded until the program
is run
Initial version: the loader ran a dynamic linker to update all
external references
Not so
good!
Dynamically Linked Libraries (contd)
Downside of the initial version
All library routines which might be called during
the running of the program
Lazy version of DLLs: each routine is linked
only after it is called
DLLs requires extra space for the information
needed for dynamic linking, but do not require
that whole libraries be copied or linked
Overhead is paid when the first time a routine is
called, but only a single indirect jump thereafter
Lazy Linkage
Indirection table
Stub: Loads routine ID,
Jump to linker/loader
Linker/loader code
Dynamically
mapped code
A Translation Hierarchy for J ava
Compiles
bytecodes of
hot methods
into native code
for host machine
Interprets
bytecodes
Simple portable
instruction set for
the J VM
J ava Bytecode & J VM
J ava bytecode instruction set
Designed to be close to the J ava language so that the
compilation step is trivial
Virtually no optimization is performed
J ava virtual machine (J VM)
A software interpreter (i.e., a program that simulates an
instruction set architecture) for executing J ava bytecodes
No assembly step for J ava since
Translation is so simple that the compiler fills in the addresses
J VM finds addresses at runtime
J ava Interpretation
Pros and cons
(+) Portability: J VM can be found in millions of devices from
cell phones to Internet browsers
(-) Low performance: factor of 10 slowdown when
compared to compiled C programs
J ust In Time compilers (J IT)
To preserve portability and to improve execution speed, J IT
compiles hot methods into the native instruction set while
the program is running
Compiled portion is saved for the next time the program is
run, making it run faster each time (i.e., balance of
interpretation and compilation evolves with time)
Procedure swap in C
void swap(int v[], int k){
int temp;
temp=v[k];
v[k]=v[k+1];
v[k+1]=temp;
}
When translating from C to assembly language by
hand, we follow these general steps:
Allocate registers to program variables
Produce code for the body of the procedure
Preserve registers across the procedure invocation
Sec. 2.13 A C Sort Example to Put It All Together
Procedure swap
Register allocation for swap
v: $a0, k: $a1
A few temporary registers
No saved register
Assembly instructions
swap: sll $t1, $a1, 2 # $t1=k*4
add $t1, $a0, $t1 # $t1=v+(k*4)
# -> address of v[k]
lw $t0, 0($t1) # $t0(temp)=v[k]
lw $t2, 4($t1) # $t2=v[k+1]
sw $t2, 0($t1) # v[k]=$t2
sw $t0, 4($t1) # v[k+1]=$t0(temp)
jr $ra
Procedure sort in C
In this case, well build a routine that calls the swap
procedure to sort an array of integers (i.e., bubble or
exchange sort) in ascending order
void sort(int v[], int n){
int i, j;
for(i=0; i<n; i+=1){
for(j=i-1; j>=0 && v[j]>v[j+1]; j-=1){
swap(v, j);
}
}
}
First for Loop of Procedure sort
C for statement has three parts: initialization, loop test and
iteration increment (e.g., for(i=0; i<n; i+=1))
move $s0, $zero # i=0
for1tst: slt $t0, $s0, $a1 # $t0=0 if i>=n
beq $t0, $zero, exit1 # go to exit1 if i>=n
...
(body of the first for loop)
...
addi $s0, $s0, 1 # i+=1
j for1tst
Exit1:
Test of
i<n
Second for Loop of Procedure sort
C statement: for(j=i-1; j>=0 && v[j]>v[j+1]; j-=1)
addi $s1, $s0, -1 # j=i-1
for2tst: slti $t0, $s1, 0 # $t0=1 if j<0
bne $t0, $zero, exit2 # go to exit2 if j<0
sll $t1, $s1, 2 # $t1=j*4
add $t2, $a0, $t1 # $t2=v+(j*4)
lw $t3, 0($t2) # $t3=v[j]
lw $t4, 4($t2) # $t4=v[j+1]
slt $t0, $t4, $t3 # $t0=0 if $t4>=$t3
beq $t0, $zero, exit2 # go to exit2 if $t4>=$t3
...
(body of the second for loop)
...
addi $s1, $s1, -1 # j-=1
j for2tst
exit2:
Test of
j>=0
Test of
v[j]>v[j+1]
Passing Parameters in sort
Problem: both the sort and the swap procedures
need the values in registers $a0 and $a1
Solution: to copy the parameters into other registers
earlier in the procedure
move $s2, $a0 # copy parameter $a0 into $s2
move $s3, $a1 # copy parameter $a1 into $s3
move $a0, $s2 # first swap parameter is v
move $a1, $s1 # second swap parameter is j
Redundant
instructions?
Full Version of sort
addi $sp, $sp, -20
sw $ra, 16($sp)
sw $s3, 12($sp)
sw $s2, 8($sp)
sw $s1, 4($sp)
sw $s0, 0($sp)
move $s2, $a0
move $s3, $a1
move $s0, $zero
slt $t0, $s0, $s3
beq $t0, $zero, exit1
addi $s1, $s0, -1
slti $t0, $s1, 0
bne $t0, $zero, exit2
sll $t1, $s1, 2
add $t2, $a0, $t1
lw $t3, 0($t2)
lw $t4, 0($t2)
slt $t0, $t4, $t3
beq $t0, $zero, exit2
move $a0, $s2
move $a1, $s1
jal swap
addi $s1, $s1, -1
j for2tst
addi $s0, $s0, 1
j for1tst
lw $s0, 0($sp)
lw $s1, 4($sp)
lw $s2, 8($sp)
lw $s3, 12($sp)
lw $ra, 16($sp)
addi $sp, $sp, 20
jr $ra
for1tst:
for2tst:
exit2:
exit1:
Inlining
optimization
Register
preservation
Using Optimizations for Bubble Sort
1.46 44,993 65,747 2.41
O3
(procedure
integration)
1.66 39,993 66,521 2.38 O2 (full)
1.79 37,470 66,900 2.37 O1 (medium)
1.38 114,938 158,615 1.00 none
CPI
Instruction
count
(millions)
Clock cycles
(millions)
Relative
performance
gcc
optimization
Time is the only accurate measure of program performance!
Comparison of 2 Sorting Algorithms
To sort 100,000 items
338 0.29 2.13 - J IT compiler
1050 0.05 0.12 -
J ava
Interpreter
1955 1.91 2.41 O3 C compiler
1555 1.50 2.38 O2 C compiler
1562 1.50 2.37 O1 C compiler
2468 1.00 1.00 none C compiler
Speedup
Quicksort vs.
Bubble Sort
Quicksort
relative
performance
Bubble Sort
relative
performance
Optimization Method
Quicksort beats Bubble Sort by a factor of 50 (0.05*2468=123 vs. 2.41)
C Procedures for Clearing an Array
clear1(int array[], int size){
int i;
for(i=0; i<size; i+=1)
array[i]=0;
}
clear2(int *array, int size){
int *p;
for(p=&array[0]; p<&array[size]; p=p+1)
*p=0;
}
Sec. 2.14 Arrays versus Pointers
Array Version of Clear()
clear1(int array[], int size){
int i;
for(i=0; i<size; i+=1)
array[i]=0;
}
move $t0, $zero # i=0
loop1: sll $t1, $t0, 2 # $t1=i*4
add $t2, $a0, $t1 # $t2=address of array[i]
sw $zero, 0($t2) # array[i]=0
addi $t0, $t0, 1 # i=i+1
slt $t3, $t0, $a1 # $t3=(i<size)
bne $t3, $zero, loop1 # if(i<size) go to loop1
Assume
that size>0
Pointer Version of Clear()
clear2(int *array; int size){
int *p;
for(p=&array[0]; p<&array[size]; p=p+1)
*p=0;
}
move $t0, $a0 # p=address of array[0]
loop2: sw $zero, 0($t0) # Memory[p]=0
addi $t0, $t0, 4 # p=p+4
sll $t1, $a1, 2 # $t1=size*4
add $t2, $a0, $t1 # $t2=address of
# array[size]
slt $t3, $t0, $t2 # $t3=(p<&array[size])
bne $t3, $zero, loop2 # if(p<&array[size])
# go to loop2
Assume
that size>0
A Faster Version of Clear()
Manual optimizations
Code motion: to utilize loop optimization
Strength reduction: shift instead of multiply
Induction variable elimination: eliminating array address
calculations within loops
move $t0, $a0 # p=address of array[0]
sll $t1, $a1, 2 # $t1=size*4
add $t2, $a0, $t1 # $t2=address of
# array[size]
loop2: sw $zero, 0($t0) # Memory[p]=0
addi $t0, $t0, 4 # p=p+4
slt $t3, $t0, $t2 # $t3=(p<&array[size])
bne $t3, $zero, loop2 # if(p<&array[size])
# go to loop2
Loop invariant
instructions
Comparing the Two Versions
Array version
move $t0, $zero
sll $t1, $t0, 2
add $t2, $a0, $t1
sw $zero, 0($t2)
addi $t0, $t0, 1
slt $t3, $t0, $a1
bne $t3, $zero, loop1
Pointer version
move $t0, $a0
sll $t1, $a1, 2
add $t2, $a0, $t1
sw $zero, 0($t0)
addi $t0, $t0, 4
slt $t3, $t0, $t2
bne $t3, $zero, loop2
loop1:
loop2:
Overhead of the array version: calculation of array addresses
Index i is incremented and new address must be recalculated
Pointer p is incremented directly
Fallacies
Powerful instruction higher performance
Fewer instructions required
But complex instructions are hard to implement
May slow down all instructions, including simple ones
Compilers are good at making fast code from
simple instructions
Use assembly code for high performance
But modern compilers are better at dealing with
modern processors
More lines of code more errors and less
productivity
Sec. 2.18 Fallacies and Pitfalls
Pitfalls
Forgetting that sequential word addresses in
machines with byte addressing do not differ
by one
Increment by 4, not by 1!
Keeping a pointer to an automatic variable
after procedure returns
A common mistake in dealing with pointers is to
pass a result from a procedure that includes a
pointer to an array that is local to that procedure
Usage of MIPS Instructions
Sec. 2.19 Concluding Remarks
0% 3%
Procedure calls, returns,
and case/switch
statements
j, jr, jal
J ump
6% 18%
if statements and
loops
beq, bne, slt,
slti
Conditional
branch
4% 18%
Operations in
assignment statements
and, or, nor,
andi, ori,
sll, srl
Logical
39% 36%
References to data
structures (e.g., arrays)
lw, sw, lb,
sb, lui
Data
transfer
48% 24%
Operations in
assignment statements
add, sub,
addi
Arithmetic
Floating pt. Integer
Frequency
HLL correspondence MIPS examples
Instruction
class

CompOrg Ch2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CompOrg Ch2

Uploaded by

Copyright:

Available Formats

Chap.

You might also like