You are on page 1of 80

UNIT 1

1.1 Basic Compiler Functions


1.2 Grammars
1.3 Lexical Analysis
1.4 Syntactic Analysis
1.5 Code Generation
1.6 Heap Management
1.7 Parameter Passing Methods
1.8 Semantics of Calls and Returns
1.9 Implementing Subprograms
1.10 Stack Dynamic Local Variables
1.11 Dynamic binding of method calls to methods
1.12 Overview of Memory Management, Virtual Memory, Process
Creation
1.13 Overview of I/O Systems, Device Drivers, System Boot

1.1 Basic Compiler Functions
compiler
Translators from one representation of the program to another
Typically from high level source code to low level machine code or object
code
Source code is normally optimized for human readability
Expressive: matches our notion of languages (and application?!)
Redundant to help avoid programming errors
Machine code is optimized for hardware
Redundancy is reduced
Information about the intent is lost
This code should execute faster and use less resources.
How to translate
Source code and machine code mismatch in level of abstraction
We have to take some steps to go from the source code to machine code.
Some languages are farther from machine code than others
Goals of translation
High level of abstraction
Good performance for the generated code
Good compile time performance
Maintainable code



The big picture
Compiler is part of program development environment
The other typical components of this environment are editor, assembler,
linker, loader, debugger, profiler etc.
Profiler will tell the parts of the program which are slow or which have
never been executed.
The compiler (and all other tools) must support each other for easy program
development


1.2 Context Free Grammars
Introduction
Finite Automata accept all regular languages and only regular languages
Many simple languages are non regular:
{a
n
b
n
: n = 0, 1, 2, }
{w : w a is palindrome}
and there is no finite automata that accepts them.
context-free languages are a larger class of languages that encompasses all
regular languages and many others, including the two above.
Context-Free Grammars
Languages that are generated by context-free grammars are context-free
languages
Context-free grammars are more expressive than finite automata: if a
language L is accepted by a finite automata then L can be generated by a
context-free grammar
Beware: The converse is NOT true
Definition.
A context-free grammar is a 4-tuple (, NT, R, S), where:
is an alphabet (each character in is called terminal)
NT is a set (each element in NT is called nonterminal)
R, the set of rules, is a subset of NT ( NT)*
if (o,|) e R, we write production o |
is called a sentential form
S, the start symbol, is one of the symbols in NT
CFGs: Alternate Definition
many textbooks use different symbols and terms to describe CFGs
G = (V, S, P, S)
V = variables a finite set
S = alphabet or terminals a finite set
P = productions a finite set
S = start variable SeV
Productions form, where AeV, ae(VS)
*
:
A a
Derivations
Definition. v is one-step derivable from u, written u v, if:
u = xoz
v = x|z
o | in R
Definition. v is derivable from u, written u * v, if:
There is a chain of one-derivations of the form:
u u
1
u
2
v
Context-Free Languages
Definition. Given a context-free grammar
G = (, NT, R, S), the language generated or
derived from G is the set:
L(G) = {w : }
Definition. A language L is context-free if there is a context-free grammar G =
(, NT, R, S), such that L is generated from G
CFGs & CFLs: Example 1
{a
n
b
n
| n>0}
One of our canonical non-RLs.
S e | a S b
Formally: G = ({S}, {a,b},
{S e, S a S b}, S)
CFGs & CFLs: Example 2
all strings of balanced parentheses
A core idea of most programming languages.
Another non-RL.
P e | ( P ) | P P
CFGs & CFLs: Lessons
Both examples used a common CFG technique, wrapping around a
recursive variable.
S a S b P ( P )
CFGs & CFLs: Non-Example
{a
n
b
n
c
n
| n>0}
Cant be done; CFL pumping lemma later.
Intuition: Can count to n, then can count down from n, but forgetting n.
I.e., a stack as a counter.
Will see this when using a machine corresponding to CFGs.

Parse Tree
A parse tree of a derivation is a tree in which:
Each internal node is labeled with a nonterminal
If a rule A A
1
A
2
A
n
occurs in the derivation then A is a parent
node of nodes labeled A
1
, A
2
, , A
n


S A | A B
A e | a | A b | A A
B b | b c | B c | b B
Sample derivations:
S AB AAB aAB aaB aabB aabb
S AB AbB Abb AAbb Aabb aabb
These two derivations use same productions, but in different orders.
This ordering difference is often uninteresting.
Derivation trees give way to abstract away ordering differences.



Leftmost, Rightmost Derivations
Definition.
A left-most derivation of a sentential form is one in which rules
transforming the left-most nonterminal are always applied
Definition.
A right-most derivation of a sentential form is one in which rules
transforming the right-most nonterminal are always applied
Leftmost & Rightmost Derivations
S A | A B
A e | a | A b | A A
B b | b c | B c | b B
Sample derivations:
S AB AAB aAB aaB aabB aabb
S AB AbB Abb AAbb Aabb aabb

These two derivations are special.
1
st
derivation is leftmost.
Always picks leftmost variable.
2
nd
derivation is rightmost.
Always picks rightmost variable.
- In proofs
o Restrict attention to left- or rightmost derivations.
- In parsing algorithms
o Restrict attention to left- or rightmost derivations.
o E.g., recursive descent uses leftmost; yacc uses rightmost.
Derivation Trees

Ambiguous Grammar
Definition. A grammar G is ambiguous if there is a word w e L(G) having are
least two different parse trees
S A
S B
S AB
A aA
B bB
A e
B e
Notice that a has at least two left-most derivations


Ambiguity
- CFG ambiguous any of following equivalent statements:
o - string w with multiple derivation trees.
o - string w with multiple leftmost derivations.
o - string w with multiple rightmost derivations.
- Defining ambiguity of grammar, not language.
Ambiguity & Disambiguation
- Given an ambiguous grammar, would like an equivalent unambiguous
grammar.
o Allows you to know more about structure of a given derivation.
o Simplifies inductive proofs on derivations.
o Can lead to more efficient parsing algorithms.
o In programming languages, want to impose a canonical structure on
derivations. E.g., for 1+23.
- Strategy: Force an ordering on all derivations.
Disambiguation: Example 1
Exp n
| Exp + Exp
| Exp Exp
What is an equivalent unambiguous grammar?
Exp Term
| Term + Exp
Term n
| n Term
Uses
- operator precedence
- left-associativity
Disambiguation
What is a general algorithm?
None exists!
There are CFLs that are inherently ambiguous

Every CFG for this language is ambiguous.
E.g., {a
n
b
n
c
m
d
m
| n>1, m>1} {a
n
b
m
c
m
d
n
| n>1, m>1}.
So, cant necessarily eliminate ambiguity!
CFG Simplification
- Cant always eliminate ambiguity.
- But, CFG simplification & restriction still useful theoretically &
pragmatically.
o Simpler grammars are easier to understand.
o Simpler grammars can lead to faster parsing.
o Restricted forms useful for some parsing algorithms.
o Restricted forms can give you more knowledge about derivations.
CFG Simplification: Example
How can the following be simplified?
S A B
S A C D
A A a
A a
A a A
A a
C e
D d D
D E
E e A e
F f f


1) Delete: B useless because nothing derivable from B.
2) Delete either AAa or AaA.
3) Delete one of the idential productions.
4) Delete & also replace SACD with SAD.
5) Replace with DeAe.
6) Delete: E useless after change #5.
7) Delete: F useless because not derivable from S
CFG Simplification
Eliminate ambiguity.
Eliminate useless variables.
Eliminate e-productions: Ac.
Eliminate unit productions: AB.
Eliminate redundant productions.
Trade left- & right-recursion.
Trading Left- & Right-Recursion
Left recursion: A A a
Right recursion: A a A
Most algorithms have trouble with one,
In recursive descent, avoid left recursion.
1.3 Lexical Analysis
Recognizing words is not completely trivial.
Therefore, we must know what the word separators are (blank, punctuation
etc.)
The language must define rules for breaking a sentence into a sequence of
words.
Compilers are much less smarter and so programming languages should
have more specific rules.
Normally white spaces and punctuations are word separators in languages.
In programming languages a character from a different class may also be
treated as word separator. Eg if (a==b)
The lexical analyzer breaks a sentence into a sequence of words or tokens:
If a == b then a = 1 ; else a = 2 ;
Sequence of words (total 14 words)
if a == b then a = 1 ; else a = 2 ;

The next step
Once the words are understood, the next step is to understand the structure
of the sentence
The process is known as syntax checking or parsing
Parsing
Parsing a program is exactly the same
Consider an expression
if stmt

predicate then-stmt else-stmt

= = = =
x y z 1 z 2
So we have a lexical analyzer and then a parser.
Semantic Analysis
Too hard for compilers. They do not have capabilities similar to human
understanding
However, compilers do perform analysis to understand the meaning and
catch inconsistencies
Programming languages define strict rules to avoid such ambiguities
{ int Amit = 3;
{ int Amit = 4;
cout << Amit;
}
}
Compilers perform many other checks besides variable bindings
Type checking
Amit left her work at home
There is a type mismatch between her and Amit. Presumably Amit is a male.
And they are not the same person.
Compiler structure


Front End Phases
Lexical Analysis
Recognize tokens and ignore white spaces, comments

Generates token stream


Error reporting
Model using regular expressions
If the token is not valid (does not fall into any of the identifiable groups)
then we have to tell the user about it.
Recognize using Finite State Automata
1.4 Syntax Analysis
Check syntax and construct abstract syntax tree


Error reporting and recovery
Error recovery is a very important part of the syntax analyzer. Compiler
should not stop when it sees an error and keep processing the input.It should report
the error only once and give only appropriate error messages. It should also
suggest possible changes. So the analyzer should give good quality error messages.
Skipping the code to recover more and more errors should be minimal.
Model using context free grammars
Recognize using Push down automata/Table Driven Parsers
Semantic Analysis
Check semantics
Error reporting
The different kinds of errors can be type mismatch
Disambiguate overloaded operators
+ is overloaded because it does integer and real number addition. So we
need to find out the type of operator also between a + b because possibly the
opcode for integer and floating point addition may be different
Type coercion
We will first correct the type of a (int) to float internally and then do the
operation for floats. We do not have to do type coercion if some language does not
allow type mixing.


Staticchecking
Typechecking
Controlflowchecking
Uniquenesschecking
Name checks
We will have to generate type information for every node. It is not important
for nodes like if and ;
Code Optimization
No strong counter part with English, but is similar to editing/prcis writing
Automatically modify programs so that they
Run faster
Use less resources (memory, registers, space, fewer fetches etc.)
Some common optimizations
Common sub-expression elimination
A=X+YP=R+X+Y save the values to
prevent recalculation
Copy propagation
whenever we are using a copy of the variable, instead of using
the copy we should be able to use the original variables itself
Dead code elimination code which is not used
Code motion move unnecessary code out of the loops
Strength reduction
2x is replaced by x+x because addition is cheaper than
multiplication. X/2 is replaced by right shift.
Constant folding
Example: x = 15 * 3 is transformed to x = 45
Compilation is done only once and execution many times
20-80 and 10-90 Rule: by 20% of the effort we can get 80%
speedup. For further 10% speedup we have to put 90% effort.
But in some programs we may need to extract every bit of
optimization


1.5 Code Generation
Usually a two step process
Generate intermediate code from the semantic representation of the
program
Generate machine code from the intermediate code
The advantage is that each phase is simple
Requires design of intermediate language
Most compilers perform translation between successive intermediate
representations
stream tokensabstraction syntax treeannotated abstract syntax
treeintermediate code
Intermediate languages are generally ordered in decreasing level of
abstraction from highest (source) to lowest (machine)
However, typically the one after the intermediate code generation is the most
important
Intermediate Code Generation
Abstraction at the source level
identifiers, operators, expressions, control flow, statements, conditionals,
iteration, functions (user defined, system defined or libraries)
Abstraction at the target level
memory locations, registers, stack, opcodes, addressing modes, system
libraries, interface to the operating systems
Code generation is mapping from source level abstractions to target machine
abstractions
Map identifiers to locations (memory/storage allocation)
Explicate variable accesses (change identifier reference to
relocatable/absolute address
Map source operators to opcodes or a sequence of opcodes
Convert conditionals and iterations to a test/jump or compare instructions
Layout parameter passing protocols: locations for parameters, return values,
layout of activations frame etc.
we should know where to pick up the parameters for function from and at
which location to store the result.
Interface calls to library, runtime system, operating systems application
library, runtime library, OS system calls
Post translation Optimizations
Algebraic transformations and re-ordering
Remove/simplify operations like
Multiplication by 1
Multiplication by 0
Addition with 0
Reorder instructions based on
Commutative properties of operators
For example x+y is same as y+x (always?)
Instruction selection
Addressing mode selection
Opcode selection
Peephole optimization


Information required about the program variables during compilation
Class of variable: keyword, identifier etc.
Type of variable: integer, float, array, function etc.
Amount of storage required
Address in the memory
Scope information
Location to store this information
Attributes with the variable (has obvious problems)
We need large amount of memory for the case a=a+b we will
have to make changes in all the structures associated with a and
so consistency will be a problem
At a central repository and every phase refers to the repository
whenever information is required
Normally the second approach is preferred
Use a data structure called symbol table
Final Compiler structure

Advantages of the model
Also known as Analysis-Synthesis model of compilation
Front end phases are known as analysis phases
Back end phases known as synthesis phases
Each phase has a well defined work
Each phase handles a logical activity in the process of compilation
Compiler is retargetable
Source and machine independent code optimization is possible.
Optimization phase can be inserted after the front and back end phases have
been developed and deployed

Specifications and Compiler Generator
How to write specifications of the source language and the target machine
Language is broken into sub components like lexemes, structure,
semantics etc.
Each component can be specified separately. For example an
identifiers may be specified as
A string of characters that has at least one alphabet
starts with an alphabet followed by alphanumeric
letter(letter|digit)*
Similarly syntax and semantics can be described



Tool based Compiler Development


Retarget Compilers
Changing specifications of a phase can lead to a new compiler
If machine specifications are changed then compiler can generate
code for a different machine without changing any other phase
If front end specifications are changed then we can get compiler for a
new language
Tool based compiler development cuts down development/maintenance time
by almost 30-40%
Tool development/testing is one time effort
Compiler performance can be improved by improving a tool and/or
specification for a particular phase


Bootstrapping
Compiler is a complex program and should not be written in assembly
language
How to write compiler for a language in the same language (first time!)?
First time this experiment was done for Lisp
Initially, Lisp was used as a notation for writing functions.
Functions were then hand translated into assembly language and executed
McCarthy wrote a function eval[e,a] in Lisp that took a Lisp expression e as
an argument
The function was later hand translated and it became an interpreter for Lisp
A compiler can be characterized by three languages: the source language
(S), the target language (T), and the implementation language (I)
The three language S, I, and T can be quite different. Such a compiler is
called cross-compiler
This is represented by a T-diagram as:

In textual form this can be represented as
S
I
T
Write a cross compiler for a language L in implementation language S to
generate code for machine N
Existing compiler for S runs on a different machine M and generates code
for M
When Compiler L
S
N is run through S
M
M we get compiler L
M


Bootstrapping a Compiler

Bootstrapping a Compiler:
the Complete picture




1.7 Parameter Passing
Some routines and calls in external Fortran classes are compiled using the
Fortran parameter passing convention. This section describes how this is achieved.
Routines without bodies in external Fortran classes and Fortran routines (routines
whose return types and all arguments are Fortran types) are compiled as described
below. The explanation is done in terms of mapping the original Sather signatures
to C prototypes. All Fortran types are assumed to have corresponding C types
defined. For example, F_INTEGER class maps onto F_INTEGER C type. See
unnamedlink for details on how this could be achieved in a portable fashion. The
examples are used to illustrate parameter passing only - the actual binding of
function names is irrelevant for this purpose.
1.7.1. Return Types
Routines that return F_INTEGER, F_REAL, F_LOGICAL, and F_DOUBLE map
to C functions that return corresponding C types. A routine that returns
F_COMPLEX or F_DOUBLE_COMPLEX is equivalent to a C routine with an
extra initial arguments preceding other arguments in the argument list. This initial
argument points to the storage for the return value.
F_COMPLEX foo(i:F_INTEGER,a:F_REAL);
-- this Sather signature is equivalent to
void foo(F_COMPLEX* ret_val, F_INTEGER* i_address, F_REAL* a_address)
A routine that returns F_CHARACTER is mapped to a C routine with two
additional arguments: a pointer to the data, and a string size, always set to 1 in the
case of F_CHARACTER.
F_CHARACTER foo(i:F_INTEGER, a:F_REAL);
-- this Sather signature maps to
void foo(F_CHARACTER* address, F_LENGTH size,
F_INTEGER* i_address, F_REAL* a_address);
Similarly, a routine returning F_STRING is equivalent to a C routine with two
additional initial arguments, a data pointer and a string length]
F_STRING foo(i:F_INTEGER, a:F_REAL);
-- this Sather signature maps to
void foo(F_CHARACTER* address, F_LENGTH size,
F_INTEGER* i, F_REAL* a);
[1] The current Sather 1.1 implementation disallows returning Fortran strings of
size greater than 32 bytes. This restriction may be lifted in the future releases.
13.4.2. Argument Types
All Fortran arguments are passed by reference. In addition, for each argument of
type F_CHARACTER or F_STRING, an extra parameter whose value is the length
of the string is appended to the end of the argument list.
foo(i:F_INTEGER,c:F_CHARACTER,a:F_REAL):F_INTEGER
-- this is mapped to
F_INTEGER foo(F_INTEGER* i_address, F_CHARACTER*c_address,
F_REAL* a_address, F_LENGTH c_length);
-- all calls have c_length set to 1
foo(i:F_INTEGER,s:F_STRING,a:F_REAL):F_INTEGER
-- this is mapped to
F_INTEGER foo(F_INTEGER* i_address, F_CHARACTER* s_address,
F_REAL* a_address, F_LENGTH s_length);
-- propoer s_length is supplied by the caller
Additional string length arguments are passed by value. If there are more than one
F_CHARACTER or F_STRING arguments, the lengths are appended to the end of
the list in the textual order of string arguments:
foo(s1:F_STRING,i:F_INTEGER,s2:F_STRING,a:F_REAL);
-- this is mapped to
void foo(F_CHARACTER* s1_address, F_INTEGER* i_address,
F_CHARACTER* s2_address, F_REAL a_address,
F_LENGTH s1_length, F_LENGTH s2_length);
Sather signatures that have F_HANDLER arguments correspond to C integer
functions whose return value represents the alternate return to take. The actual
handlers are not passed to the Fortran code. Instead, code to do the branching
based on the return value is emitted by the Sather compiler to conform to the
alternate return semantics.
Arguments of type F_ROUT are passed as function pointers.
Thus, the entire C argument list including additional arguments consists of:
- one additional argument due to F_COMPLEX or F_DOUBLE_COMPLEX
return type, or two additional arguments due to F_CHARACTER or
F_STRING return type
- references to "normal" arguments corresponding to a Sather signature
argument list
- additional arguments for each F_CHARACTER or F_STRING argument in
the Sather signature
The following example combines all rules
foo(s1:F_STRING, i:F_INTEGER, a:F_REAL,
c:F_CHARACTER):F_COMPLEX
-- is mapped to
void foo(F_COMPLEX* ret_address, F_CHARACTER* s1_address,
F_INTEGER* i_address, F_REAL* a_address,
F_CHARACTER* c_address, F_LENGTH s1_length,
F_LENGTH c_length);
-- all Sather calls have c_length set to 1
1.7.2. OUT and INOUT Arguments
Sather 1.1 provides the extra flexibility of 'out' and 'inout' argument modes
for Fortran calls. The Sather compiler ensures that the semantics of 'out' and 'inout'
is preserved even when calls cross the Sather language boundaries. In particular,
the changes to such arguments are not observed until the call is complete - thus the
interlanguage calls have the same semantics as regular Sather calls.
This additional mechanism makes the semantics of some arguments visually
explicit and consequently helps catch some bugs caused by the modification of 'in'
arguments (all Fortran arguments are passed by reference, and Fortran code can
potentially modify all arguments without restrictions.) A special compiler option
may enable checking the invariance of Fortran 'in' argument.
The ICSI Sather 1.1 compiler currently does not implement this
functionality.
In the case of calling Fortran code, the Sather compiler ensures that the
value/result semantics is preserved by the caller - the Sather compiler has no
control over external Fortran code. This may involve copying 'inout' arguments to
temporaries and passing references to these temporaries to Fortran. In the case of
Sather routines that are called from Fortran, the Sather compiler emits a special
prologue for such routines to ensure the value/result semantics for the Fortran
caller. In summary, the value/result semantics for external calls to Fortran is
ensured by the caller, and for Sather routines that are meant to be called by Fortran
it is implemented by the callee.
This example suggests how a signature for a routine that swaps two integers:
SUBROUTINE SWAP(A,B)
INTEGER A,B

-- a Sather signature may look like
swap(inout a:F_INTEGER, inout b:F_INTEGER);
Note that using argument modes in this example makes the semantics of the
routine more obvious.
In the following example, compiling the program with all checks on may reveal a
bug due to the incorrect modification of the vector sizes:
SUBROUTINE ADD_VECTORS(A,B,RES,size)
REAL A(*),B(*),RES(*)
INTEGER SIZE

-- Sather signature
add_vectors(a,b,res:F_ARRAY{F_REAL}, size:F_INTEGER)
-- size is an 'in' parameter and cannot be modified by Fortran code
In addition to extra debugging capabilities, 'in' arguments are passed slightly more
efficiently than 'out' and 'inout' arguments.
Points to note
- F_ROUT and F_HANDLER types cannot be "out" or "inout" arguments.
Parameter Passing, Calls, Symbol Tables & Irs
Parameter Passing
Three semantic classes (semantic models) of parameters
- IN: pass value to subprogram
- OUT: pass value back to caller
- INOUT: pass value in and back
- Implementation alternatives
- Copy value
- Pass an access path (e.g. a pointer)

Parameter Passing Methods
Pass-by-Value
Pass-by-Reference
Pass-by-Result
Pass-by-Value-Re

Pass-by-value
Copy actual into formal
Default in many imperative languages
Only kind used in C and Java
Used for IN parameter passing
Actual can typically be arbitrary expresion
including constant & variable

Pass-by-value cont.
Advantage
Cannot modify actuals
So IN is automatically enforced
Disadvantage
Copying of large objects is expensive
Dont want to copy whole array each call!
Implementation
Formal allocated on stack like a local variable
Value initialized with actual
Optimization sometimes possible: keep only in register

Pass-by-result
Used for OUT parameters
No value transmitted to subprogram
Actual MUST be variable (more precisely value)
foo(x) and foo(a[1]) are fine but not foo(3) or
foo(x * y)
Pass-by-result gotchas
procedure foo(out int x, out int y) {
g := 4; x := 42;
y := 0; }
main() {
b: array[1..10] of integer;
g: integer; g = 0;
call to foo: }

Pass-by-value-result
Implementation model for in-out parameters
Simply a combination of pass by value and pass by result
Same advantages & disadvantages
Actual must be all value
Pass-by-reference
Also implements IN-OUT
Pass an access path, no copy is performed
Advantages:
Efficient, no copying, no extra space
Disadvantages
Parameter access usually slower (via indirection)
If only IN is required, may change value inadvertently
Creates aliases

Pass-by-reference aliases
int g;
void foo(int& x) {
x = 1;
}
foo(g);
g and x are aliased
Pass-by-name
Textual substitution of argument in subprogram
Used in Algol for in-out parameters, C macro preprocessor
evaluated at each reference to formal parameter in subprogram
Subprogram can change values of variables used in argument expression
Programmer must rename variables in subprogram in case of name clashes
Evaluation uses reference environment of caller
Jensens device
real procedure sigma(x, i, n);
value n;
real x; integer i, n;
begin
real s;
s := 0;
for i := 1 step 1 until n do
s := s + x;
sigma := s;
end
What does sigma(a(i), i, 10) do
Design Issues
Typechecking
Are procedural parameters included?
May not be possible in independent compilation
Type-loophole in original Pascal: was not checked, but later procedure
type required in formal declaration
Pass-by-name Safety Problem
procedure swap(a, b);
integer a,b,temp;
begin
temp := a;
a := b;
b := temp;
end;
swap(x,y):
swap(i, x(i))
Call-by-name Implementation
Variables & constants easy
Reference & copy
Expressions are harder
Have to use parameterless procedures aka.
Procedures as Parameters
In some languages procedures are first-class citizens, i.e., they can be assigned to
variables, passed as arguments like any other data types
-Even C, C++, Pascal have some (limited) support for procedural parameters
-Major use: can write more general procedures, e.g. standard library in C:
qsort(void* base, size_t nmemb, size_t
size, int(*compar)(const void*, const void*));
Design Issues
Typechecking
-Are procedural parameters included?
-May not be possible in independent compilation
-Type-loophole in original Pascal: was not checked, but later procedure type
requiredin formal declaration
Prcedures as parameters
How do we implement static scope rules?
= how do we set up the static link?
program param(input, output);
procedure b(function h(n : integer):integer);
begin writeln(h(2)) end {b};
procedure c;
var m : integer;
function f(n : integer) : integer;
begin f := m + n end {f};
begin m: = 0; b(f) end {c};
begin c
end.
Solution: pass static link:






Procedure Calls
5 Steps during procedure invocation
Procedure call (caller)
Procedure prologue (callee)
Procedure execution (callee)
Procedure epilogue (callee)
Caller restores execution environment and receives return value
(caller)
The Call
Steps during procedure invocation
Each argument is evaluated and put in corresponding register or stack
location
Address of called procedure is determined
In most cases already known at compile / link time
Caller-saved registers in use are saved in memory (on the stack)
Static link is computed
Return address is saved in a register and branch to callees code is
executed
The Prologue
Save fp, fp := sp , sp = sp frame size
Callee-saved registers used by callee are saved in memory
Construct display (if used in lieu of static link)
The Epilogue
Callee-saved registers that were saved are restored
Restore old sp and fp
Put return value in return register / stack location
Branch to return address
Post Return
Caller restores caller-saved registers that were saved
Return value is used
Division of caller-saved vs callee-saved is important
Reduces number of register saves
4 classes: caller-saved, callee-saved, temporary and dedicated
Best division is program dependent so calling convention is a compromise
Argument Registers
Additional register class used when many GPRs available
Separate for integer and floating point arguments
Additional arguments passed on stack
Access via fp+offset
Return values
Return value register or memory if too large
Could be allocated in callers or callees space
Callees space: not reentrant!
Callers space
Pass pointer to callers return value space
If size is provided as well callee can check for fit
Procedure
Calls with
Register
Windows








Symbol Tables
Maps symbol names to attributes
Common attributes
Name: String
Class:Enumeration (storage class)
Volatile:Boolean
Size:Integer
Bitsize:Integer
Boundary: Integer
Bitbdry:Integer
Type:Enumeration or Type referent
Basetype:Enumeration or Type referent
Machtype:Enumeration
Nelts:Integer
Register:Boolean
Reg:String (register name)
Basereg:String
Disp:
Symbol Table Operations
New_Sym_Tab:SymTab -> SymTab
Dest_Sym_Tab:SymTab -> SymTab
Destroys symtab and returns parent
Insert_Sym:SymTab X Symbol -> boolean
Returns false if already present, otherwise inserts and returns true
Locate_Sym:SymTab X Symbol -> boolean
Get_Sym_Attr:SymTab X Symbol x Attr -> Value
Set_Sym_Attr:SymTab X Symbol x Attr X Value -> boolean
Next_Sym:SymTab X Symbol -> Symbol
More_Syms:SymTab X Symbol -> boolean
Implementation Goals
Fast insertion and lookup operations for symbols and attributes
Alternatives
Balanced binary tree
Hash table (the usual choice)
Open addressing or
Buckets (commonly used)

Scoping and Symbol Tables
Nested scopes (e.g. Pascal) can be represented as a tree
Implement by pushing / popping symbol tables on/off a symbol table stack
More efficient implementation with two stacks
Scoping with Two Stacks








Visibility versus Scope
So far we have assumed scope ~ visibility
Visibility directly corresponds to scope this is called open scope
Closed scope =visibility explicitly specified
Arises in module systems (import) and inheritance mechanisms in oo languages
Can be implemented by adding a list of scope level numbers in which a symbol is
visible
Optimized implementation needs just one scope number
Stack represents declared scope or outermost exported scope
Hash table implements visibility by reordering hash chain
Intermediate Representations
Make optimizer independent of source and target language
Usually multiple levels
HIR = high level encodes source language semantics
Can express language-specific optimizations
MIR = representation for multiple source and target languages
Can express source/target independent optimizations
LIR = low level representation with many specifics to target
Can express target-specific optimizations
IR Goals
Primary goals
Easy & effective analysis
Few cases
Support for things of interest
Easy transformations
General across source / target languages
Secondary goals
Compact in memory
Easy to translate from / to
Debugging support
Extensible & displayable
High-Level IRs
Abstract syntax tree + symbol table most common
LISP S-expressions
Medium-level IRs
Represent source variables + temporaries and registers
Reduce control flow to conditional + unconditional branches
Explicit operations for procedure calls and block structure
Most popular: three address code
t1 := t2 op t3 (address at most 3 operands)
if t goto L
t1 := t2 < t3
Important MIRs
SSA = static single assignment form
Like 3-address code but every variable has exactly one reaching definition
Makes variables independent of the locations they are in
Makes many optimization algorithms more effective
SSA Example
x := u

x
x := v

x
x1 := u

x1
x2 := v

x2
Other Representations
Triples
(1) i + 1
(2) i := (1)
(3) i + 1
(4) p + 4
(5) *(4)
(6) p := (4)
Trees
Like AST but at lower level
Directed Acyclic Graphs (DAGs)
More compact than trees through node sharing
Three Address Code Example





Representation Components
-Operations
-Dependences between operations
-Control dependences: sequencing of operations
-Evaluation of then & else depend on result of test
-Side effects of statements occur in right order
-Data dependences: flow of values from definitions to uses
-Operands computed before operation
-Values read from variable before being overwritten
-Want to represent only relevant dependences
-Dependences constrain operations, so the fewer the better
Representing Control Dependence
Implicit in AST
Explicit as Control Flow Graphs (CFGs)
Nodes are basic blocks
Instructions in block sequence side effects
Edges represent branches (control flow between blocks)
Fancier:
Control Dependence Graph
Part of the PDG (program dependence graph)
Value dependence graph (VDG)
Control dependence converted to data dependence

Data Dependence Kinds
True (flow) dependence (read after write RAW)
Reflects real data flow, operands to operation
Anti-dependence (WAR)
Output dependence (WAW)
Reflects overwriting of memory, not real data flow
Can often be eliminated
Data Dependence Example








Representing Data Dependences (within bbs)
-Sequence of instructions
-Simple
-Easy analysis
-But: may overconstrain operation order
-Expression tree / DAG
-Directly captures dependences in block
-Supports local CSE (common subexpression elimination)
-Can be compact
-Harder to analyze & transform
-Eventually has to be linearized
Representing Data Dependences (across blocks)
-Implicit via def-use
-Simple
-Makes analysis slow (have to compute dependences each time)
-Explicit: def-use chains
-Fast
-Space-consuming
-Has to be updated after transformations
-Advanced options:
-SSA
-VDGs
-Dependence glow graphs (DFGs)
1.8 Semantics of Calls and Returns
- Def: The subprogram call and return operations of a language are together
called its subprogram linkage
1.9 Implementing Subprograms
The General Semantics of Calls and Returns
The subprogram call and return operations of a language are together called its
subprogram linkage.
A subprogram call in a typical language has numerous actions associated with
it.
The call must include the mechanism for whatever parameter-passing method is
used.
If local vars are not static, the call must cause storage to be allocated for the
locals declared in the called subprogram and bind those vars to that storage.
It must save the execution status of the calling program unit.
It must arrange to transfer control to the code of the subprogram and ensure that
control to the code of the subprogram execution is completed.
Finally, if the language allows nested subprograms, the call must cause some
mechanism to be created to provide access to non-local vars that are visible to
the called subprogram.
Implementing Simple Subprograms
Simple means that subprograms cannot be nested and all local vars are static.
The semantics of a call to a simple subprogram requires the following actions:
1. Save the execution status of the caller.
2. Carry out the parameter-passing process.
3. Pass the return address to the callee.
4. Transfer control to the callee.
The semantics of a return from a simple subprogram requires the following
actions:
1. If pass-by-value-result parameters are used, move the current values of
those parameters to their corresponding actual parameters.
2. If it is a function, move the functional value to a place the caller can get
it.
3. Restore the execution status of the caller.
4. Transfer control back to the caller.
The call and return actions require storage for the following:
Status information of the caller,
parameters,
return address, and
functional value (if it is a function)
These, along with the local vars and the subprogram code, form the complete
set of information a subprogram needs to execute and then return control to the
caller.
A simple subprogram consists of two separate parts:
The actual code of the subprogram, which is constant, and
The local variables and data, which can change when the subprogram is
executed. Both of which have fixed sizes.

The format, or layout, of the non-code part of an executing subprogram is called
an activation record, b/c the data it describes are only relevant during the
activation of the subprogram.
The form of an activation record is static.
An activation record instance is a concrete example of an activation record
(the collection of data for a particular subprogram activation)
B/c languages with simple subprograms do not support recursion; there can be
only one active version of a given subprogram at a time.
Therefore, there can be only a single instance of the activation record for a
subprogram.
One possible layout for activation records is shown below.





B/c an activation record instance for a simple subprogram has a fixed size, it
can be statically allocated.
The following figure shows a program consisting of a main program and three
subprograms: A, B, and C.


The construction of the complete program shown above is not done entirely by
the compiler.
In fact, b/c of independent compilation, MAIN, A, B, and C may have been
compiled on different days, or even in different years.
At the time each unit is compiled, the machine code for it, along with a list of
references to external subprograms is written to a file.
The executable program shown above is put together by the linker, which is
part of the O/S.
1.10 Implementing Subprograms with Stack-Dynamic Local Variables
One of the most important advantages of stack-dynamic local vars is support for
recursion.
More Complex Activation Records
Subprogram linkage in languages that use stack-dynamic local vars are more
complex than the linkage of simple subprograms for the following reasons:
o The compiler must generate code to cause the implicit allocation and
deallocation of local variables
o Recursion must be supported (adds the possibility of multiple
simultaneous activations of a subprogram), which means there can be
more than one instance of a subprogram at a given time, with one call
from outside the subprogram and one or more recursive calls.
o Recursion, therefore, requires multiple instances of activation records,
one for each subprogram activation that can exist at the same time.
o Each activation requires its own copy of the formal parameters and the
dynamically allocated local vars, along with the return address.
The format of an activation record for a given subprogram in most languages is
known at compile time.
In many cases, the size is also known for activation records b/c all local data is
of fixed size.
In languages with stack-dynamic local vars, activation record instances must be
created dynamically. The following figure shows the activation record for such
a language.

B/c the return address, dynamic link, and parameters are placed in the activation
record instance by the caller, these entries must appear first.
The return address often consists of a ptr to the code segment of the caller and
an offset address in that code segment of the instruction following the call.
The dynamic link points to the top of an instance of the activation record of the
caller.
In static-scoped languages, this link is used in the destruction of the current
activation record instance when the procedure completes its execution.
The stack top is set to the value of the old dynamic link.
The actual parameters in the activation record are the values or addresses
provided by the caller.
Local scalar vars are bound to storage within an activation record instance.
Local structure vars are sometimes allocated elsewhere, and only their
descriptors and a ptr to that storage are part of the activation record.
Local vars are allocated and possibly initialized in the called subprogram, so
they appear last.
Consider the following C skeletal function:
void sub(float total, int part)
{
int list[4];
float sum;

}
The activation record for sub is:



Activating a subprogram requires the dynamic creation of an instance of the
activation record for the subprogram.
B/c of the call and return semantics specify that the subprogram last called is
the first to complete, it is reasonable to create instances of these activations
records on a stack.
This stack is part of the run-time system and is called run-time stack.
Every subprogram activation, whether recursive or non-recursive, creates a new
instance of an activation record on the stack.
This provides the required separate copies of the parameters, local vars, and
return address.
An Example without Recursion
Consider the following skeletal C program

void fun1(int x) {
int y;
... 2
fun3(y);
...
}
void fun2(float r) {
int s, t;
... 1
fun1(s);
...
}
void fun3(int q) {
... 3
}
void main() {
float p;
...
fun2(p);
...
}

The sequence of procedure calls in this program is:

main calls fun2
fun2 calls fun1
fun1 calls fun3

The stack contents for the points labeled 1, 2, and 3 are shown in the figure
below:

At point 1, only ARI for main and fun2 are on the stack.
When fun2 calls fun1, an ARI of fun1 is created on the stack.
When fun1 calls fun3, an ARI of fun3 is created on the stack.
When fun3s execution ends, its ARI is removed from the stack, and the
dynamic link is used to reset the stack top pointer.
A similar process takes place when fun1 and fun2 terminate.
After the return from the call to fun2 from main, the stack has only the ARI of
main.
In this example, we assume that the stack grows from lower addresses to higher
addresses.
The collection of dynamic links present in the stack at a given time is called the
dynamic chain, or call chain.
It represents the dynamic history of how execution got to its current position,
which is always in the subprogram code whose activation record instance is on
top of the stack.
References to local vars can be represented in the code as offsets from the
beginning of the activation record of the local scope.
Such an offset is called a local_offset.
The local_offset of a local variable can be determined by the compiler at
compile time, using the order, types, and sizes of vars declared in the
subprogram associated with the activation record.
Assume that all vars take one position in the activation record.
The first local variable declared would be allocated in the activation record two
positions plus the number of parameters from the bottom (the first two positions
are for the return address and the dynamic link)
The second local var declared would be one position nearer the stack top and so
forth; e.g., in fun1, the local_offset of y is 3.
Likewise, in fun2, the local_offset of s is 3; for t is 4.
Recursion
Consider the following C program which uses recursion:

int factorial(int n) {
<-----------------------------1
if (n <= 1)
return 1;
else return (n * factorial(n - 1));
<-----------------------------2
}
void main() {
int value;
value = factorial(3);
<-----------------------------3
}


The activation record format for the program is shown below:




Notice the additional entry in the ARI for the returned value of the function.
Figure below shows the contents of the stack for the three times that execution
reaches position 1 in the function factorial.
Each shows one more activation of the function, with its functional value
undefined.
The 1
st
ARI has the return address to the calling function, main.
The others have a return address to the function itself; these are for the
recursive calls.
Figure below shows the stack contents for the three times that execution reaches
position 2 in the function factorial.
Position 2 is meant to be the time after the return is executed but before the
ARI has been removed from the stack.
The 1
st
return from factorial returns 1. Thus, ARI for that activation has a value
of 1 for its version of the parameter n.
The result from that multiplication, 1, is returned to the 2
nd
activation of
factorial to be multiplied by its parameter value for n, which is 2.
This returns the value 2 to the 1
st
activation of factorial to be multiplied by its
parameter for value n, which is 3, yielding the final functional value of 6, which
is then returned to the first call to factorial in main

Stack contents at position 1 in factorial is shown below.


























Figure below shows the stack contents during execution of main and factorial.
Nested Subprograms
Some of the non-C-based static-scoped languages (e.g., Fortran 95, Ada,
JavaScript) use stack-dynamic local variables and allow subprograms to be
nested.
The Basics
All variables that can be non-locally accessed reside in some activation record
instance in the stack.
The process of locating a non-local reference:
Find the correct activation record instance in the stack in which the var was
allocated.
Determine the correct offset of the var within that activation record instance to
access it.
The Process of Locating a Non-local Reference:
Finding the correct activation record instance:
Only vars that are declared in static ancestor scopes are visible and can be
accessed.
Static semantic rules guarantee that all non-local variables that can be
referenced have been allocated in some activation record instance that is on the
stack when the reference is made.
A subprogram is callable only when all of its static ancestor subprograms are
active.
The semantics of non-local references dictates that the correct declaration is the
first one found when looking through the enclosing scopes, most closely nested
first.
Static Chains
A static chain is a chain of static links that connects certain activation record
instances in the stack.
The static link, static scope pointer, in an activation record instance for
subprogram A points to one of the activation record instances of A's static
parent.
The static link appears in the activation record below the parameters.
The static chain from an activation record instance connects it to all of its static
ancestors.
During the execution of a procedure P, the static link of its activation record
instance points to an activation of Ps static program unit.
That instances static link points, in turn, to Ps static grandparent program
units activation record instance, if there is one.
So the static chain links all the static ancestors of an executing subprogram, in
order of static parent first.
This chain can obviously be used to implement the access to non-local vars in
static-scoped languages.
When a reference is made to a non-local var, the ARI containing the var can be
found by searching the static chain until a static ancestor ARI is found that
contains the var.
B/c the nesting scope is known at compile time, the compiler can determine not
only that a reference is non-local but also the length of the static chain must be
followed to reach the ARI that contains the non-local object.
A static_depth is an integer associated with a static scope whose value is the
depth of nesting of that scope.
main ----- static_depth = 0
A ----- static_depth = 1
B ----- static_depth = 2


C ----- static_depth = 1

The length of the static chain needed to reach the correct ARI for a non-local
reference to a var X is exactly the difference between the static_depth of the
procedure containing the reference to X and the static_depth of the
procedure containing the declaration for X
The difference is called the nesting_depth, or chain_offset, of the
reference.
The actual reference can be represented by an ordered pair of integers
(chain_offset, local_offset), where chain_offset is the number of links to the
correct ARI.

procedure A is
procedure B is
procedure C is

end; // C

end; // B

end; // A
The static_depths of A, B, and C are 0, 1, 2, respectively.
If procedure C references a var in A, the chain_offset of that reference
would be 2 (static_depth of C minus the static_depth of A).
If procedure C references a var in B, the chain_offset of that reference would
be 1 (static_depth of C minus the static_depth of B).
References to locals can be handled using the same mechanism, with a
chain_offset of 0.

procedure MAIN_2 is
X : integer;
procedure BIGSUB is
A, B, C : integer;
procedure SUB1 is
A, D : integer;
begin { SUB1 }
A := B + C; <-----------------------1

end; { SUB1 }
procedure SUB2(X : integer) is
B, E : integer;
procedure SUB3 is
C, E : integer;
begin { SUB3 }

SUB1;

E := B + A: <--------------------2
end; { SUB3 }
begin { SUB2 }

SUB3;

A := D + E; <-----------------------3
end; { SUB2 }
begin { BIGSUB }

SUB2(7);

end; { BIGSUB }
begin

BIGSUB;
end; { MAIN_2 }
The sequence of procedure calls is:
MAIN_2 calls BIGSUB
BIGSUB calls SUB2
SUB2 calls SUB3
SUB3 calls SUB1

The stack situation when execution first arrives at point 1 in this program is
shown below:
















At position 1 in SUB1:
A - (0, 3)
B - (1, 4)
C - (1, 5)

At position 2 in SUB3:
E - (0, 4)
B - (1, 4)
A - (2, 3)

At position 3 in SUB2:
A - (1, 3)
D - an error
E - (0, 5)
1.12 Overview of Memory Management, Virtual Memory, Process
Creation
Memory Management
Program data is stored in memory.
Memory is a finite resource: programs may need to reuse some of it.
Most programming languages provide two means of structuring
data stored in memory:
Stack:
memory space (stack frames) for storing data local to a function body.
The programming language provides faciliBes for automaBcally managing
stackallocated data. (i.e. compiler emits code for allocaBng/freeing stack
frames)
(Aside: Unsafe languages like C/C++ dont enforce the stack invariant,
which leads to bugs that can be exploited for code injecBon a^acks)
Heap:
memory space for storing data that is created by a function
but needed in a caller. (Its lifeBme is unknown at compile Bme.)
Freeing/reusing this memory can be up to the programmer (C/C++)
(Aside: Freeing memory twice or never freeing it also leads to many bugs
in C/C++ programs)
Garbage collection automates memory management for Java/ML/C#/etc.
Explicit Memory Management
On unix, libc provides a library that allows programmers to manage the
heap:
void * malloc(size_t n)
Allocates n bytes of storage on the heap and returns its address.
void free(void *addr)
Releases the memory previously allocated by malloc address addr.
These are userlevel library funcBons. Internally, malloc uses brk (or sbrk)
system calls to have the kernel allocate space to the process.

Simple Implementation: Free Lists
Arrange the blocks of unused memory in a free list.
Each block has a pointer to the next free block.
Each block keeps track of its size. (Stored before & aeer data parts.)
Each block has a status flag = allocated or unallocated (Kept as a bit in the
first size (assuming size is a mulBple of 2 so the last bit is unused)
Malloc: walk down free list, find a block big enough
First fit? Best fit?
Free: insert the freed block into the free list.
Perhaps keep list sorted so that adjacent blocks can be merged.
Problems:
FragmentaBon ruins the heap
Malloc can be slow

Exponential Scaling / Buddy System
Keep an array of freelists: FreeList[i]
FreeList[i] points to a list of blocks of size 2i
Malloc: round requested size up to nearest power of 2
When FreeList[i] is empty, divide a block from FreeList[i+1] into two
halves, put both chunks into FreeList[i]
AlternaBvely, merge together two adjacent nodes from FreeList[i1]
Free: puts freed block back into appropriate free list
Malloc & free take O(1) Bme
This approach trades external fragmentaBon (within the heap
as a whole) for internal fragmentaBon (within each block).
Wasted space: ~30%
Manual memory management is cumbersome & error prone:
Freeing the same pointer twice is ill defined (seg fault or other bugs)
Calling free on some pointer not created by malloc (e.g. to an element
of an array) is also ill defined
malloc and free arent modular: To properly free all allocated memory,
the programmer has to know what code owns each object. Owner code
must ensure free is called just once.
Not calling free leads to space leaks: memory never reclaimed
Many examples of space leaks in longrunning programs
Garbage collecBon:
Have the language runBme system determine when an allocated chunk of
memory will no longer be used and free it automaBcally.
But garbage collector is usually the most complex part of a languages
runBme system.
Garbage collecBon does impose costs (performance, predictability)
Memory Use & Reachability
When is a chunk of memory no longer needed?
In general, this problem is undecidable.
We can approximate this informaBon by freeing memory that cant be reached
from any root references.
A root pointer is one that might be accessible directly from the
program (i.e. theyre not in the heap).
Root pointers include pointer values stored in registers, in global
variables, or on the stack.
If a memory cell is part of a record (or other data structure)
that can be reached by traversing pointers from the root, it is live.
It is safe to reclaim all memory cells not reachable from a root
(such cells are garbage).
Reachability & Pointers
StarBng from stack, registers, & globals (roots), determine which
objects in the heap are reachable following pointers.
Reclaim any object that isn't reachable.
Requires being able to disBnguish pointer values from other values
(e.g., ints).
Type safe languages:
OCaml, SML/NJ use the low bit:
1 it's a scalar, 0 it's a pointer. (Hence 31bit ints in OCaml)
Java puts the tag bits in the object metadata (uses more space).
Type safety implies that casts cant introduce new pointers
Also, pointers are abstract (references), so objects can be moved without
changing the meaning of the program
Mark and Sweep Garbage Collection
Classic algorithm with two phases:
Phase 1: Mark
Start from the roots
Do depthfirst traversal, marking every object reached.
Phase 2: Sweep
Walk over all allocated objects and check for marks.
Unmarked objects are reclaimed.
Marked objects have their marks cleared.
OpBonal: compact all live objects in heap by moving them adjacent to
one another. (needs extra work & indirecBon to patch up pointers)
DeutschSchorrWaite (DSW) Algorithm
No need for a stack, it is possible to use the graph being
traversed itself to store the data necessary
Idea: during depthfirstsearch, each pointer is followed only
once. The algorithm can reverse the pointers on the way
down and restore them on the way back up.
Mark a bit on each object traversed on the way down.
Two pointers:
curr: points to the current node
prev points to the previous node
On the way down, flip pointers as you traverse them:
tmp := curr
curr := curr.next
tmp.next := prev
prev := curr
Costs & Implications
Need to generalize to account for objects that have mulBple
outgoing pointers.
Depthfirst traversal terminates when there are no children
pointers or all children are already marked.
Accounts for cycles in the object graph.
The DeutschSchorrWaite algorithm breaks objects during
the traversal.
All computaBon must be halted during the mark phase. (Bad for
concurrent programs!)
Mark & Sweep algorithm reads all memory in use by the
program (even if its garbage!)
Running Bme is proporBonal to the total amount of allocated memory
(both live and garbage).
Can pause the programs for long Bmes during garbage collection.
Virtual Memory
Background
Demand Paging
Process Creation
Page Replacement
Allocation of Frames
Thrashing
Operating System Examples

Virtual memory separation of user logical memory from physical
memory.
o Only part of the program needs to be in memory for execution.
o Logical address space can therefore be much larger than physical
address space.
o Allows address spaces to be shared by several processes.
o Allows for more efficient process creation.
Virtual memory can be implemented via:
o Demand paging
o Demand segmentation

Virtual Memory That is Larger Than Physical Memory

Demand paging
Bring a page into memory only when it is needed.
o Less I/O needed
o Less memory needed
o Faster response
o More users
Page is needed reference to it
o invalid reference abort
o not-in-memory bring to memory
Page Fault
If there is ever a reference to a page, first reference will trap to
OS page fault
OS looks at another table to decide:
o Invalid reference abort.
o Just not in memory.
Get empty frame.
Swap page into frame.
Reset tables, validation bit = 1.
Restart instruction: Least Recently Used
o Blockmove
o auto increment/decrement location
Steps in Handling a Page Fault

no free frame
Page replacement find some page in memory, but not really in use, swap
it out.
o algorithm
o performance want an algorithm which will result in minimum
number of page faults.
Same page may be brought into memory several times.
Performance of Demand Paging

Page Fault Rate 0 s p s 1.0
o if p = 0 no page faults
o if p = 1, every reference is a fault
Effective Access Time (EAT)
- EAT = (1 p) x memory access
+ p (page fault overhead
+ [swap page out ]
+ swap page in
+ restart overhead)
Demand Paging Example
Memory access time = 1 microsecond
50% of the time the page that is being replaced has been modified and
therefore needs to be swapped out.
Swap Page Time = 10 msec = 10,000 msec
EAT = (1 p) x 1 + p (15000)
1 + 15000P (in msec)
Process Creation
Virtual memory allows other benefits during process creation:
- Copy-on-Write
- Memory-Mapped Files

Copy-on-Write
Copy-on-Write (COW) allows both parent and child processes to initially
share the same pages in memory.

If either process modifies a shared page, only then is the page copied.
COW allows more efficient process creation as only modified pages are
copied.
Free pages are allocated from a pool of zeroed-out pages.
Memory-Mapped Files
Memory-mapped file I/O allows file I/O to be treated as routine memory
access by mapping a disk block to a page in memory.
A file is initially read using demand paging. A page-sized portion of the file
is read from the file system into a physical page. Subsequent reads/writes
to/from the file are treated as ordinary memory accesses.
Simplifies file access by treating file I/O through memory rather than read()
write() system calls.
Also allows several processes to map the same file allowing the pages in
memory to be shared.
Memory Mapped Files
Page Replacement
Prevent over-allocation of memory by modifying page-fault service routine
to include page replacement.
Use modify (dirty) bit to reduce overhead of page transfers only modified
pages are written to disk.
Page replacement completes separation between logical memory and
physical memory large virtual memory can be provided on a smaller
physical memory.
Need For Page Replacement


Page Replacement Algorithms
Want lowest page-fault rate.
Evaluate algorithm by running it on a particular string of memory references
(reference string) and computing the number of page faults on that string.
In all our examples, the reference string is
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5.


FIFO Page Replacement

Optical algorithm
Silberschatz, Galvin and Gagne 2002 10.26 Operating System Concepts
Optimal Algorithm
Replace page that will not be used for longest period of
time.
4 frames example
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
How do you know this?
Used for measuring how well your algorithm performs.
1
2
3
4
6 page faults
4 5

Optimal Page Replacement


LRU Page Replacement

LRU Algorithm (Cont.)
- Stack implementation keep a stack of page numbers in a double link form:
o Page referenced:
move it to the top
requires 6 pointers to be changed
o No search for replacement

Use Of A Stack to Record The Most Recent Page References

LRU Approximation Algorithms
- Reference bit
o With each page associate a bit, initially = 0
o When page is referenced bit set to 1.
o Replace the one which is 0 (if one exists). We do not know the order,
however.
- Second chance
o Need reference bit.
o Clock replacement.
o If page to be replaced (in clock order) has reference bit = 1. then:
set reference bit 0.
leave page in memory.
replace next page (in clock order), subject to same rules.

Second-Chance (clock) Page-Replacement Algorithm

Counting Algorithms
- Keep a counter of the number of references that have been made to each
page.
- LFU Algorithm: replaces page with smallest count.
- MFU Algorithm: based on the argument that the page with the smallest
count was probably just brought in and has yet to be used.
Allocation of Frames
- Each process needs minimum number of pages.
- Example: IBM 370 6 pages to handle SS MOVE instruction:
o instruction is 6 bytes, might span 2 pages.
o 2 pages to handle from.
o 2 pages to handle to.
- Two major allocation schemes.
o fixed allocation
o priority allocation
Fixed Allocation
- Equal allocation e.g., if 100 frames and 5 processes, give each 20 pages.
- Proportional allocation Allocate according to the size of process.




n FIFO Page ReplacemfgkkmmUse modify (dirty) bit to reduce overhead
of page transfers only modified pagesority Allolacement completes
sepa
Priority Allocation
- Use a proportional allocation scheme using priorities rather than size.
- If process P
i
generates a page fault,
o select for replacement one of its frames.
m
S
s
p a
m
s S
p s
i
i i
i
i i
= =
=
=
=

for allocation
frames of number total
process of size
59 64
137
127
5 64
137
10
127
10
64
2
1
2
~ =
~ =
=
=
=
a
a
s
s
m
i
o select for replacement a frame from a process with lower priority
number.
Global vs. Local Allocation
- Global replacement process selects a replacement frame from the set of all
frames; one process can take a frame from another.
- Local replacement each process selects from only its own set of allocated
frames.
Thrashing
- If a process does not have enough pages, the page-fault rate is very high.
This leads to:
o low CPU utilization.
o operating system thinks that it needs to increase the degree of
multiprogramming.
o another process added to the system.
- Thrashing a process is busy swapping pages in and out.

You might also like