Lecture 09

Announcements
• WA2 due today

– Electronic hand-in
Overview of Semantic Analysis – Or after class
– Or under my office door by 5pm
• PA2 due tonight

Lecture 9 – Electronic hand-in
• Regrades
– No more than one deduction for each error
– But you have to prove it to us . . .
Prof. Aiken CS 143 Lecture 9 1 Prof. Aiken CS 143 Lecture 9 2
Midterm Thursday Outline
• In class • The role of semantic analysis in a compiler

– SCPD students come to campus for the exam – A laundry list of tasks
• Material through lecture 8 • Scope

– Implementation: symbol tables
• Four sheets hand- or type-written notes
• Types
The Compiler So Far Why a Separate Semantic Analysis?
• Lexical analysis • Parsing cannot catch some errors

– Detects inputs with illegal tokens
• Some language constructs not context-free
• Parsing
– Detects inputs with ill-formed parse trees
• Semantic analysis
– Last “front end” phase
– Catches all remaining errors
1
What Does Semantic Analysis Do? Scope
• Checks of many kinds . . . coolc checks: • Matching identifier declarations with uses
1. All identifiers are declared – Important static analysis step in most languages
2. Types – Including COOL!
3. Inheritance relationships
4. Classes defined only once
5. Methods in a class defined only once
6. Reserved identifiers are not misused
And others . . .
• The requirements depend on the language

What’s Wrong? Scope (Cont.)
• Example 1 • The scope of an identifier is the portion of a

Let y: String  “abc” in y + 3 program in which that identifier is accessible
• Example 2 • The same identifier may refer to different

Let y: Int in x + 3 things in different parts of the program
– Different scopes for same name don’t overlap
Note: An example a property that is not context

• An identifier may have restricted scope
free.
Static vs. Dynamic Scope Static Scoping Example
• Most languages have static scope let x: Int <- 0 in

– Scope depends only on the program text, not run- {
time behavior
x;
– Cool has static scope
let x: Int <- 1 in
x;
• A few languages are dynamically scoped
– Lisp, SNOBOL
x;
– Lisp has changed to mostly static scoping }
– Scope depends on execution of the program
2
Static Scoping Example (Cont.) Dynamic Scope
let x: Int <- 0 in • A dynamically-scoped variable refers to the

{ closest enclosing binding in the execution of
x; the program
let x: Int <- 1 in
x; • Example
g(y) = let a  4 in f(3);
x;
f(x) = a;
}
Uses of x refer to closest enclosing definition • More about dynamic scope later in the course
Scope in Cool Scope in Cool (Cont.)
• Cool identifier bindings are introduced by • Not all kinds of identifiers follow the most-
– Class declarations (introduce class names) closely nested rule
– Method definitions (introduce method names)
– Let expressions (introduce object id’s) • For example, class definitions in Cool
– Formal parameters (introduce object id’s) – Cannot be nested
– Attribute definitions (introduce object id’s) – Are globally visible throughout the program
– Case expressions (introduce object id’s)
• In other words, a class name can be used

before it is defined
Example: Use Before Definition More Scope in Cool
Class Foo { Attribute names are global within the class in

. . . let y: Bar in . . . which they are defined
};
Class Foo {
Class Bar { f(): Int { a };
... a: Int  0;
}; }
3
More Scope (Cont.) Implementing the Most-Closely Nested Rule
• Method/attribute names have complex rules • Much of semantic analysis can be expressed as
a recursive descent of an AST
• A method need not be defined in the class in
which it is used, but in some parent class – Before: Process an AST node n
– Recurse: Process the children of n
– After: Finish processing the AST node n
• Methods may also be redefined (overridden)
• When performing semantic analysis on a

portion of the the AST, we need to know
which identifiers are defined
Implementing . . . (Cont.) Symbol Tables
• Example: the scope of let bindings is one • Consider again: let x: Int  0 in e
subtree of the AST: • Idea:
– Before processing e, add definition of x to current
let x: Int  0 in e definitions, overriding any other definition of x
– Recurse
– After processing e, remove definition of x and
restore old definition of x
• x is defined in subtree e • A symbol table is a data structure that tracks

the current bindings of identifiers
A Simple Symbol Table Implementation Limitations
• Structure is a stack • The simple symbol table works for let

– Symbols added one at a time
• Operations – Declarations are perfectly nested
– add_symbol(x) push x and associated info, such as
x’s type, on the stack • What doesn’t it work for?
– find_symbol(x) search stack, starting from top,
for x. Return first x found or NULL if none found
– remove_symbol() pop the stack
• Why does this work?

4
A Fancier Symbol Table Class Definitions
• enter_scope() start a new nested scope • Class names can be used before being defined
• find_symbol(x) finds current x (or null)
• We can’t check class names
• add_symbol(x) add a symbol x to the table – using a symbol table
– or even in one pass
• check_scope(x) true if x defined in current
scope • Solution
• exit_scope() exit current scope – Pass 1: Gather all class names
– Pass 2: Do the checking
We will supply a symbol table manager for your • Semantic analysis requires multiple passes
project – Probably more than two
Types Why Do We Need Type Systems?
• What is a type? Consider the assembly language fragment

– The notion varies from language to language
add $r1, $r2, $r3
• Consensus
– A set of values
– A set of operations on those values
What are the types of $r1, $r2, $r3?
• Classes are one instantiation of the modern

notion of type
Types and Operations Type Systems
• Certain operations are legal for values of each • A language’s type system specifies which
type operations are valid for which types
– It doesn’t make sense to add a function pointer and • The goal of type checking is to ensure that
an integer in C operations are used with the correct types
– Enforces intended interpretation of values,
– It does make sense to add two integers because nothing else will!
– But both have the same assembly language

implementation!
5
Type Checking Overview The Type Wars
• Three kinds of languages: • Competing views on static vs. dynamic typing
– Statically typed: All or almost all checking of types • Static typing proponents say:
is done as part of compilation (C, Java, Cool) – Static checking catches many programming errors at compile
time
– Avoids overhead of runtime type checks
– Dynamically typed: Almost all checking of types is
done as part of program execution (Scheme) • Dynamic typing proponents say:
– Static type systems are restrictive
– Untyped: No type checking (machine code) – Rapid prototyping difficult within a static type system
The Type Wars (Cont.) Types Outline
• In practice, most code is written in statically • Type concepts in COOL

typed languages with an “escape” mechanism
– Unsafe casts in C, Java • Notation for type rules
– Logical rules of inference
• It’s debatable whether this compromise
represents the best or worst of both worlds • COOL type rules
• General properties of type systems
Cool Types Type Checking and Type Inference
• The types are: • Type Checking is the process of verifying fully

– Class Names typed programs
– SELF_TYPE
• Type Inference is the process of filling in
• The user declares types for identifiers missing type information
• The compiler infers types for expressions • The two are different, but the terms are
– Infers a type for every expression often used interchangeably
6
Rules of Inference Why Rules of Inference?
• We have seen two examples of formal notation • Inference rules have the form
specifying parts of a compiler If Hypothesis is true, then Conclusion is true
– Regular expressions
– Context-free grammars • Type checking computes via reasoning
If E1 and E2 have certain types, then E3 has a
• The appropriate formalism for type checking certain type
is logical rules of inference
• Rules of inference are a compact notation for
“If-Then” statements
From English to an Inference Rule From English to an Inference Rule (2)
• The notation is easy to read with practice If e1 has type Int and e2 has type Int,
then e1 + e2 has type Int
• Start with a simplified system and gradually
add features (e1 has type Int  e2 has type Int) 
e1 + e2 has type Int
• Building blocks
– Symbol  is “and” (e1: Int  e2: Int)  e1 + e2: Int
– Symbol  is “if-then”
– x:T is “x has type T”
From English to an Inference Rule (3) Notation for Inference Rules
The statement • By tradition inference rules are written

(e1: Int  e2: Int)  e1 + e2: Int
Hypothesis1  Hypothesisn
is a special case of
Conclusion
Hypothesis1  . . .  Hypothesisn  Conclusion
• Cool type rules have hypotheses and
This is an inference rule conclusions
` e:T
• ` means “it is provable that . . .”
7
Two Rules Two Rules (Cont.)
• These rules give templates describing how to

i is an integer type integers and + expressions
[Int]
` i: Int
• By filling in the templates, we can produce
complete typings for expressions
` e1 : Int
è2 : Int
[Add]
` e1  e2 : Int
Example: 1 + 2 Soundness
• A type system is sound if

– Whenever ` e:T
– Then e evaluates to a value of type T
1 is an integer 2 is an integer
` 1: Int ` 2: Int
• We only want sound rules
` 1+2: Int – But some sound rules are better than others:
i is an integer
` i:Object
Type Checking Proofs Rules for Constants
• Type checking proves facts e: T

– Proof is on the structure of the AST
– Proof has the shape of the AST [Bool]
`false: Bool
– One type rule is used for each AST node
• In the type rule used for a node e:
– Hypotheses are the proofs of types of e’s s is a string constant
subexpressions [String]
` s: String
– Conclusion is the type of e
• Types are computed in a bottom-up pass over
the AST
8
Rule for New Two More Rules
new T produces an object of type T

– Ignore SELF_TYPE for now . . . ` e: Bool
[Not]
` e: Bool
[New]
` new T: T ` e1 : Bool
` e2 : T
[Loop]
` while e1 loop e2 pool: Object
A Problem A Solution
• What is the type of a variable reference? • Put more information in the rules!
x is an identifier [Var] • A type environment gives types for free

` x: ? variables
– A type environment is a function from
ObjectIdentifiers to Types
• The local, structural rule does not carry – A variable is free in an expression if it is not
enough information to give x a type. defined within the expression
Type Environments Modified Rules
Let O be a function from ObjectIdentifiers to The type environment is added to the earlier
Types rules:
i is an integer
[Int]
The sentence O ` i: Int
O ` e: T
is read: Under the assumption that variables O ` e1 : Int
have the types given by O, it is provable that
the expression e has the type T O ` e2
: Int
[Add]
O ` e1  e2 : Int
9
New Rules Let
And we can write new rules:

O[T0 /x] ` e1 : T1
[Let-No-Init]
O(x) = T [Var] O ` let x:T0 in e1 : T1
O ` x: T
O[T/y] means O modified to return T on
argument y
Note that the let-rule enforces variable scope
Notes Let with Initialization
• The type environment gives types to the free Now consider let with initialization:
identifiers in the current scope
O ` e0 :T0
• The type environment is passed down the AST
from the root towards the leaves O[T0/x] ` e1 : T1 [Let-Init]
O ` let x:T0  e0 in e1 : T1
• Types are computed up the AST from the
leaves towards the root
This rule is weak. Why?
Subtyping Assignment
• Define a relation  on classes • Both let rules are sound, but more programs
– XX typecheck with the second one
– X  Y if X inherits from Y
– X  Z if X  Y and Y  Z • More uses of subtyping:
• An improvement O(Id) =T0

O ` e0 :T O ` e1 : T1
T  T0 [Assign]
T1  T0
[Let-Init]
O[T0/x] ` e1 : T1 O ` Id  e1 : T1
O ` let x:T0  e0 in e1 : T1
10
Initialized Attributes If-Then-Else
• Let OC(x) = T for all attributes x:T in class C • Consider:

if e0 then e1 else e2 fi
• Attribute initialization is similar to let, except
for the scope of names • The result can be either e1 or e2
OC (Id) =T0 • The type is either e1’s type of e2’s type

OC ` e1 : T1
[Attr-Init] • The best we can do is the smallest supertype
T1  T0
OC ` Id:T0  e1 ; larger than the type of e1 or e2
Least Upper Bounds If-Then-Else Revisited
• lub(X,Y), the least upper bound of X and Y, is

Z if
– XZYZ O ` e0 : Bool
Z is an upper bound O ` e1 : T1
O ` e2 : T2 [If-Then-Else]
– X  Z’  Y  Z’  Z  Z’
Z is least among upper bounds O ` if
i e0 then e1 else e2 fi : lub(T1 ,T2 )
• In COOL, the least upper bound of two types

is their least common ancestor in the
inheritance tree
Case Method Dispatch
• The rule for case expressions takes a lub over • There is a problem with type checking method
all branches calls:
O ` e0 : T0
O ` e1 : T1
O
O `` ee00 :: T
T00

O[T1 //x
O[T / x1 1]] `è1e:1 T  1’
: 1T [Case] [Dispatch]
O ` en : Tn

OO`ìe0e.f(e 1 ,,en )
0.f(e1,...,e
:
n): ?
O[T / xn]] `èen: T
O[Tnn //x :T n’
n n n
• We need information about the formal
OO`ìcase
case ee00 of x11:T11 
) e11;
…;;xxnn:T ) eenn;; esac
:Tnn  lub(T11’,…,T
esac : lub(T ,,T 
n )
n’)
parameters and return type of f
11
Notes on Dispatch The Dispatch Rule Revisited
• In Cool, method and object identifiers live in

O,M ` e0 : T0
different name spaces
– A method foo and an object foo can coexist in the
O,M ` e1 : T1
same scope 
• In the type rules, this is reflected by a O,M ` en : Tn
separate mapping M for method signatures [Dispatch]
M(T0 , f) = (T1,,Tn,Tn+1
 )
M(C,f) = (T1,. . .Tn,Tn+1)
Ti  Ti for 1  i  n
means in class C there is a method f
O,M ` ie00.f(e
O,M 1 , ,enn):
.f(e1,...,e ) T’
:Tn+1

f(x1:T1,. . .,xn:Tn): Tn+1
Static Dispatch Static Dispatch (Cont.)
• Static dispatch is a variation on normal O,M ` e0 : T0

dispatch O,M ` e1 : T1

• The method is found in the class explicitly O,M ` en : Tn
named by the programmer T0  T [StaticDispatch]
M(T, f) = (T1,,Tn,Tn+1
 )
• The inferred type of the dispatch expression
Ti  Ti for 1  i  n
must conform to the specified type
O,M
O,M ìee00@T.f(e 1 , ,enn):
@T.f(e1,...,e ) :T 
T’n+1
n+1
The Method Environment More Environments
• The method environment must be added to all • For some cases involving SELF_TYPE, we need
rules to know the class in which an expression
• In most cases, M is passed down but not appears
actually used • The full type environment for COOL:
– Only the dispatch rules use M – A mapping O giving types to object id’s
– A mapping M giving types to methods
O,M ` e1 : Int – The current class C
O,M ` e2 : Int [Add]
O,M ` e1  e2 : Int
12
Sentences Type Systems
The form of a sentence in the logic is • The rules in this lecture are COOL-specific
– More info on rules for self next time
O,M,C ` e :T – Other languages have very different rules
• General themes
– Type rules are defined on the structure of
Example: expressions
O,M,C ` e1 : int – Types of variables are modeled by an environment
O,M,C ` e2 : int [Add]
• Warning: Type rules are very compact!
O,M,C ` e1  e2 : int
One-Pass Type Checking Implementing Type Systems
• COOL type checking can be implemented in a O,M,C ` e1 : Int

single traversal over the AST
O,M,C ` e2 : Int [Add]
• Type environment is passed down the tree O,M,C ` e1  e2 : Int

– From parent to child TypeCheck(Environment, e1 + e2) = {
T1 = TypeCheck(Environment, e1);
• Types are passed up the tree T2 = TypeCheck(Environment, e2);
– From child to parent Check T1 == T2 == Int;
return Int; }
13

Lecture 09

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 09

Uploaded by

Copyright:

Available Formats

Announcements

• WA2 due today

• PA2 due tonight

Prof. Aiken CS 143 Lecture 9 1 Prof. Aiken CS 143 Lecture 9 2

Midterm Thursday Outline

• In class • The role of semantic analysis in a compiler

• Material through lecture 8 • Scope

Prof. Aiken CS 143 Lecture 9 3 Prof. Aiken CS 143 Lecture 9 4

The Compiler So Far Why a Separate Semantic Analysis?

• Lexical analysis • Parsing cannot catch some errors

Prof. Aiken CS 143 Lecture 9 5 Prof. Aiken CS 143 Lecture 9 6

• The requirements depend on the language

What’s Wrong? Scope (Cont.)

• Example 1 • The scope of an identifier is the portion of a

• Example 2 • The same identifier may refer to different

Note: An example a property that is not context

Prof. Aiken CS 143 Lecture 9 9 Prof. Aiken CS 143 Lecture 9 10

Static vs. Dynamic Scope Static Scoping Example

• Most languages have static scope let x: Int <- 0 in

Prof. Aiken CS 143 Lecture 9 11 Prof. Aiken CS 143 Lecture 9 12

let x: Int <- 0 in • A dynamically-scoped variable refers to the

Scope in Cool Scope in Cool (Cont.)

• In other words, a class name can be used

Example: Use Before Definition More Scope in Cool

Class Foo { Attribute names are global within the class in

Prof. Aiken CS 143 Lecture 9 17 Prof. Aiken CS 143 Lecture 9 18

• When performing semantic analysis on a

Implementing . . . (Cont.) Symbol Tables

• x is defined in subtree e • A symbol table is a data structure that tracks

A Simple Symbol Table Implementation Limitations

• Structure is a stack • The simple symbol table works for let

• Why does this work?

Prof. Aiken CS 143 Lecture 9 25 Prof. Aiken CS 143 Lecture 9 26

Types Why Do We Need Type Systems?

• What is a type? Consider the assembly language fragment

• Classes are one instantiation of the modern

Types and Operations Type Systems

– But both have the same assembly language

• Three kinds of languages: • Competing views on static vs. dynamic typing

Prof. Aiken CS 143 Lecture 9 31 Prof. Aiken CS 143 Lecture 9 32

The Type Wars (Cont.) Types Outline

• In practice, most code is written in statically • Type concepts in COOL

• General properties of type systems

Prof. Aiken CS 143 Lecture 9 33 Prof. Aiken CS 143 Lecture 9 34

Cool Types Type Checking and Type Inference

• The types are: • Type Checking is the process of verifying fully

Prof. Aiken CS 143 Lecture 9 35 Prof. Aiken CS 143 Lecture 9 36

Prof. Aiken CS 143 Lecture 9 37 Prof. Aiken CS 143 Lecture 9 38

From English to an Inference Rule From English to an Inference Rule (2)

From English to an Inference Rule (3) Notation for Inference Rules

The statement • By tradition inference rules are written

Prof. Aiken CS 143 Lecture 9 41 Prof. Aiken CS 143 Lecture 9 42

• These rules give templates describing how to

Prof. Aiken CS 143 Lecture 9 43 Prof. Aiken CS 143 Lecture 9 44

• A type system is sound if

Prof. Aiken CS 143 Lecture 9 45 Prof. Aiken CS 143 Lecture 9 46

Type Checking Proofs Rules for Constants

• Type checking proves facts e: T

Prof. Aiken CS 143 Lecture 9 47 Prof. Aiken CS 143 Lecture 9 48

new T produces an object of type T

Prof. Aiken CS 143 Lecture 9 49 Prof. Aiken CS 143 Lecture 9 50

x is an identifier [Var] • A type environment gives types for free