You are on page 1of 13

Announcements

• WA2 due today


– Electronic hand-in
Overview of Semantic Analysis – Or after class
– Or under my office door by 5pm

• PA2 due tonight


Lecture 9 – Electronic hand-in

• Regrades
– No more than one deduction for each error
– But you have to prove it to us . . .

Prof. Aiken CS 143 Lecture 9 1 Prof. Aiken CS 143 Lecture 9 2

Midterm Thursday Outline

• In class • The role of semantic analysis in a compiler


– SCPD students come to campus for the exam – A laundry list of tasks

• Material through lecture 8 • Scope


– Implementation: symbol tables
• Four sheets hand- or type-written notes
• Types

Prof. Aiken CS 143 Lecture 9 3 Prof. Aiken CS 143 Lecture 9 4

The Compiler So Far Why a Separate Semantic Analysis?

• Lexical analysis • Parsing cannot catch some errors


– Detects inputs with illegal tokens
• Some language constructs not context-free
• Parsing
– Detects inputs with ill-formed parse trees

• Semantic analysis
– Last “front end” phase
– Catches all remaining errors

Prof. Aiken CS 143 Lecture 9 5 Prof. Aiken CS 143 Lecture 9 6

1
What Does Semantic Analysis Do? Scope

• Checks of many kinds . . . coolc checks: • Matching identifier declarations with uses
1. All identifiers are declared – Important static analysis step in most languages
2. Types – Including COOL!
3. Inheritance relationships
4. Classes defined only once
5. Methods in a class defined only once
6. Reserved identifiers are not misused
And others . . .

• The requirements depend on the language


Prof. Aiken CS 143 Lecture 9 7 Prof. Aiken CS 143 Lecture 9 8

What’s Wrong? Scope (Cont.)

• Example 1 • The scope of an identifier is the portion of a


Let y: String  “abc” in y + 3 program in which that identifier is accessible

• Example 2 • The same identifier may refer to different


Let y: Int in x + 3 things in different parts of the program
– Different scopes for same name don’t overlap

Note: An example a property that is not context


• An identifier may have restricted scope
free.

Prof. Aiken CS 143 Lecture 9 9 Prof. Aiken CS 143 Lecture 9 10

Static vs. Dynamic Scope Static Scoping Example

• Most languages have static scope let x: Int <- 0 in


– Scope depends only on the program text, not run- {
time behavior
x;
– Cool has static scope
let x: Int <- 1 in
x;
• A few languages are dynamically scoped
– Lisp, SNOBOL
x;
– Lisp has changed to mostly static scoping }
– Scope depends on execution of the program

Prof. Aiken CS 143 Lecture 9 11 Prof. Aiken CS 143 Lecture 9 12

2
Static Scoping Example (Cont.) Dynamic Scope

let x: Int <- 0 in • A dynamically-scoped variable refers to the


{ closest enclosing binding in the execution of
x; the program
let x: Int <- 1 in
x; • Example
g(y) = let a  4 in f(3);
x;
f(x) = a;
}
Uses of x refer to closest enclosing definition • More about dynamic scope later in the course
Prof. Aiken CS 143 Lecture 9 13 Prof. Aiken CS 143 Lecture 9 14

Scope in Cool Scope in Cool (Cont.)

• Cool identifier bindings are introduced by • Not all kinds of identifiers follow the most-
– Class declarations (introduce class names) closely nested rule
– Method definitions (introduce method names)
– Let expressions (introduce object id’s) • For example, class definitions in Cool
– Formal parameters (introduce object id’s) – Cannot be nested
– Attribute definitions (introduce object id’s) – Are globally visible throughout the program
– Case expressions (introduce object id’s)

• In other words, a class name can be used


before it is defined
Prof. Aiken CS 143 Lecture 9 15 Prof. Aiken CS 143 Lecture 9 16

Example: Use Before Definition More Scope in Cool

Class Foo { Attribute names are global within the class in


. . . let y: Bar in . . . which they are defined
};
Class Foo {
Class Bar { f(): Int { a };
... a: Int  0;
}; }

Prof. Aiken CS 143 Lecture 9 17 Prof. Aiken CS 143 Lecture 9 18

3
More Scope (Cont.) Implementing the Most-Closely Nested Rule

• Method/attribute names have complex rules • Much of semantic analysis can be expressed as
a recursive descent of an AST
• A method need not be defined in the class in
which it is used, but in some parent class – Before: Process an AST node n
– Recurse: Process the children of n
– After: Finish processing the AST node n
• Methods may also be redefined (overridden)

• When performing semantic analysis on a


portion of the the AST, we need to know
which identifiers are defined
Prof. Aiken CS 143 Lecture 9 19 Prof. Aiken CS 143 Lecture 9 20

Implementing . . . (Cont.) Symbol Tables

• Example: the scope of let bindings is one • Consider again: let x: Int  0 in e
subtree of the AST: • Idea:
– Before processing e, add definition of x to current
let x: Int  0 in e definitions, overriding any other definition of x
– Recurse
– After processing e, remove definition of x and
restore old definition of x

• x is defined in subtree e • A symbol table is a data structure that tracks


the current bindings of identifiers
Prof. Aiken CS 143 Lecture 9 21 Prof. Aiken CS 143 Lecture 9 22

A Simple Symbol Table Implementation Limitations

• Structure is a stack • The simple symbol table works for let


– Symbols added one at a time
• Operations – Declarations are perfectly nested
– add_symbol(x) push x and associated info, such as
x’s type, on the stack • What doesn’t it work for?
– find_symbol(x) search stack, starting from top,
for x. Return first x found or NULL if none found
– remove_symbol() pop the stack

• Why does this work?


Prof. Aiken CS 143 Lecture 9 23 Prof. Aiken CS 143 Lecture 9 24

4
A Fancier Symbol Table Class Definitions

• enter_scope() start a new nested scope • Class names can be used before being defined
• find_symbol(x) finds current x (or null)
• We can’t check class names
• add_symbol(x) add a symbol x to the table – using a symbol table
– or even in one pass
• check_scope(x) true if x defined in current
scope • Solution
• exit_scope() exit current scope – Pass 1: Gather all class names
– Pass 2: Do the checking

We will supply a symbol table manager for your • Semantic analysis requires multiple passes
project – Probably more than two

Prof. Aiken CS 143 Lecture 9 25 Prof. Aiken CS 143 Lecture 9 26

Types Why Do We Need Type Systems?

• What is a type? Consider the assembly language fragment


– The notion varies from language to language
add $r1, $r2, $r3
• Consensus
– A set of values
– A set of operations on those values
What are the types of $r1, $r2, $r3?

• Classes are one instantiation of the modern


notion of type
Prof. Aiken CS 143 Lecture 9 27 Prof. Aiken CS 143 Lecture 9 28

Types and Operations Type Systems

• Certain operations are legal for values of each • A language’s type system specifies which
type operations are valid for which types

– It doesn’t make sense to add a function pointer and • The goal of type checking is to ensure that
an integer in C operations are used with the correct types
– Enforces intended interpretation of values,
– It does make sense to add two integers because nothing else will!

– But both have the same assembly language


implementation!
Prof. Aiken CS 143 Lecture 9 29 Prof. Aiken CS 143 Lecture 9 30

5
Type Checking Overview The Type Wars

• Three kinds of languages: • Competing views on static vs. dynamic typing

– Statically typed: All or almost all checking of types • Static typing proponents say:
is done as part of compilation (C, Java, Cool) – Static checking catches many programming errors at compile
time
– Avoids overhead of runtime type checks
– Dynamically typed: Almost all checking of types is
done as part of program execution (Scheme) • Dynamic typing proponents say:
– Static type systems are restrictive
– Untyped: No type checking (machine code) – Rapid prototyping difficult within a static type system

Prof. Aiken CS 143 Lecture 9 31 Prof. Aiken CS 143 Lecture 9 32

The Type Wars (Cont.) Types Outline

• In practice, most code is written in statically • Type concepts in COOL


typed languages with an “escape” mechanism
– Unsafe casts in C, Java • Notation for type rules
– Logical rules of inference
• It’s debatable whether this compromise
represents the best or worst of both worlds • COOL type rules

• General properties of type systems

Prof. Aiken CS 143 Lecture 9 33 Prof. Aiken CS 143 Lecture 9 34

Cool Types Type Checking and Type Inference

• The types are: • Type Checking is the process of verifying fully


– Class Names typed programs
– SELF_TYPE
• Type Inference is the process of filling in
• The user declares types for identifiers missing type information

• The compiler infers types for expressions • The two are different, but the terms are
– Infers a type for every expression often used interchangeably

Prof. Aiken CS 143 Lecture 9 35 Prof. Aiken CS 143 Lecture 9 36

6
Rules of Inference Why Rules of Inference?

• We have seen two examples of formal notation • Inference rules have the form
specifying parts of a compiler If Hypothesis is true, then Conclusion is true
– Regular expressions
– Context-free grammars • Type checking computes via reasoning
If E1 and E2 have certain types, then E3 has a
• The appropriate formalism for type checking certain type
is logical rules of inference
• Rules of inference are a compact notation for
“If-Then” statements

Prof. Aiken CS 143 Lecture 9 37 Prof. Aiken CS 143 Lecture 9 38

From English to an Inference Rule From English to an Inference Rule (2)

• The notation is easy to read with practice If e1 has type Int and e2 has type Int,
then e1 + e2 has type Int
• Start with a simplified system and gradually
add features (e1 has type Int  e2 has type Int) 
e1 + e2 has type Int
• Building blocks
– Symbol  is “and” (e1: Int  e2: Int)  e1 + e2: Int
– Symbol  is “if-then”
– x:T is “x has type T”
Prof. Aiken CS 143 Lecture 9 39 Prof. Aiken CS 143 Lecture 9 40

From English to an Inference Rule (3) Notation for Inference Rules

The statement • By tradition inference rules are written


(e1: Int  e2: Int)  e1 + e2: Int
Hypothesis1  Hypothesisn
is a special case of
Conclusion
Hypothesis1  . . .  Hypothesisn  Conclusion
• Cool type rules have hypotheses and
This is an inference rule conclusions
` e:T
• ` means “it is provable that . . .”

Prof. Aiken CS 143 Lecture 9 41 Prof. Aiken CS 143 Lecture 9 42

7
Two Rules Two Rules (Cont.)

• These rules give templates describing how to


i is an integer type integers and + expressions
[Int]
` i: Int
• By filling in the templates, we can produce
complete typings for expressions
` e1 : Int
`e2 : Int
[Add]
` e1  e2 : Int

Prof. Aiken CS 143 Lecture 9 43 Prof. Aiken CS 143 Lecture 9 44

Example: 1 + 2 Soundness

• A type system is sound if


– Whenever ` e:T
– Then e evaluates to a value of type T
1 is an integer 2 is an integer
` 1: Int ` 2: Int
• We only want sound rules
` 1+2: Int – But some sound rules are better than others:

i is an integer
` i:Object

Prof. Aiken CS 143 Lecture 9 45 Prof. Aiken CS 143 Lecture 9 46

Type Checking Proofs Rules for Constants

• Type checking proves facts e: T


– Proof is on the structure of the AST
– Proof has the shape of the AST [Bool]
`false: Bool
– One type rule is used for each AST node
• In the type rule used for a node e:
– Hypotheses are the proofs of types of e’s s is a string constant
subexpressions [String]
` s: String
– Conclusion is the type of e
• Types are computed in a bottom-up pass over
the AST

Prof. Aiken CS 143 Lecture 9 47 Prof. Aiken CS 143 Lecture 9 48

8
Rule for New Two More Rules

new T produces an object of type T


– Ignore SELF_TYPE for now . . . ` e: Bool
[Not]
` e: Bool

[New]
` new T: T ` e1 : Bool
` e2 : T
[Loop]
` while e1 loop e2 pool: Object

Prof. Aiken CS 143 Lecture 9 49 Prof. Aiken CS 143 Lecture 9 50

A Problem A Solution

• What is the type of a variable reference? • Put more information in the rules!

x is an identifier [Var] • A type environment gives types for free


` x: ? variables
– A type environment is a function from
ObjectIdentifiers to Types
• The local, structural rule does not carry – A variable is free in an expression if it is not
enough information to give x a type. defined within the expression

Prof. Aiken CS 143 Lecture 9 51 Prof. Aiken CS 143 Lecture 9 52

Type Environments Modified Rules

Let O be a function from ObjectIdentifiers to The type environment is added to the earlier
Types rules:
i is an integer
[Int]
The sentence O ` i: Int
O ` e: T
is read: Under the assumption that variables O ` e1 : Int
have the types given by O, it is provable that
the expression e has the type T O ` e2
: Int
[Add]
O ` e1  e2 : Int
Prof. Aiken CS 143 Lecture 9 53 Prof. Aiken CS 143 Lecture 9 54

9
New Rules Let

And we can write new rules:


O[T0 /x] ` e1 : T1
[Let-No-Init]
O(x) = T [Var] O ` let x:T0 in e1 : T1
O ` x: T
O[T/y] means O modified to return T on
argument y

Note that the let-rule enforces variable scope

Prof. Aiken CS 143 Lecture 9 55 Prof. Aiken CS 143 Lecture 9 56

Notes Let with Initialization

• The type environment gives types to the free Now consider let with initialization:
identifiers in the current scope

O ` e0 :T0
• The type environment is passed down the AST
from the root towards the leaves O[T0/x] ` e1 : T1 [Let-Init]
O ` let x:T0  e0 in e1 : T1
• Types are computed up the AST from the
leaves towards the root
This rule is weak. Why?

Prof. Aiken CS 143 Lecture 9 57 Prof. Aiken CS 143 Lecture 9 58

Subtyping Assignment

• Define a relation  on classes • Both let rules are sound, but more programs
– XX typecheck with the second one
– X  Y if X inherits from Y
– X  Z if X  Y and Y  Z • More uses of subtyping:

• An improvement O(Id) =T0


O ` e0 :T O ` e1 : T1
T  T0 [Assign]
T1  T0
[Let-Init]
O[T0/x] ` e1 : T1 O ` Id  e1 : T1
O ` let x:T0  e0 in e1 : T1
Prof. Aiken CS 143 Lecture 9 59 Prof. Aiken CS 143 Lecture 9 60

10
Initialized Attributes If-Then-Else

• Let OC(x) = T for all attributes x:T in class C • Consider:


if e0 then e1 else e2 fi
• Attribute initialization is similar to let, except
for the scope of names • The result can be either e1 or e2

OC (Id) =T0 • The type is either e1’s type of e2’s type


OC ` e1 : T1
[Attr-Init] • The best we can do is the smallest supertype
T1  T0
OC ` Id:T0  e1 ; larger than the type of e1 or e2

Prof. Aiken CS 143 Lecture 9 61 Prof. Aiken CS 143 Lecture 9 62

Least Upper Bounds If-Then-Else Revisited

• lub(X,Y), the least upper bound of X and Y, is


Z if
– XZYZ O ` e0 : Bool
Z is an upper bound O ` e1 : T1
O ` e2 : T2 [If-Then-Else]
– X  Z’  Y  Z’  Z  Z’
Z is least among upper bounds O ` if
i e0 then e1 else e2 fi : lub(T1 ,T2 )

• In COOL, the least upper bound of two types


is their least common ancestor in the
inheritance tree
Prof. Aiken CS 143 Lecture 9 63 Prof. Aiken CS 143 Lecture 9 64

Case Method Dispatch

• The rule for case expressions takes a lub over • There is a problem with type checking method
all branches calls:
O ` e0 : T0
O ` e1 : T1
O
O `` ee00 :: T
T00

O[T1 //x
O[T / x1 1]] ``e1e:1 T  1’
: 1T [Case] [Dispatch]
O ` en : Tn

OO``ie0e.f(e 1 ,,en )
0.f(e1,...,e
:
n): ?
O[T / xn]] ``een: T
O[Tnn //x :T n’
n n n
• We need information about the formal
OO``icase
case ee00 of x11:T11 
) e11;
…;;xxnn:T ) eenn;; esac
:Tnn  lub(T11’,…,T
esac : lub(T ,,T 
n )
n’)
parameters and return type of f

Prof. Aiken CS 143 Lecture 9 65 Prof. Aiken CS 143 Lecture 9 66

11
Notes on Dispatch The Dispatch Rule Revisited

• In Cool, method and object identifiers live in


O,M ` e0 : T0
different name spaces
– A method foo and an object foo can coexist in the
O,M ` e1 : T1
same scope 
• In the type rules, this is reflected by a O,M ` en : Tn
separate mapping M for method signatures [Dispatch]
M(T0 , f) = (T1,,Tn,Tn+1
 )
M(C,f) = (T1,. . .Tn,Tn+1)
Ti  Ti for 1  i  n
means in class C there is a method f
O,M ` ie00.f(e
O,M 1 , ,enn):
.f(e1,...,e ) T’
:Tn+1

f(x1:T1,. . .,xn:Tn): Tn+1
Prof. Aiken CS 143 Lecture 9 67 Prof. Aiken CS 143 Lecture 9 68

Static Dispatch Static Dispatch (Cont.)

• Static dispatch is a variation on normal O,M ` e0 : T0


dispatch O,M ` e1 : T1

• The method is found in the class explicitly O,M ` en : Tn
named by the programmer T0  T [StaticDispatch]
M(T, f) = (T1,,Tn,Tn+1
 )
• The inferred type of the dispatch expression
Ti  Ti for 1  i  n
must conform to the specified type
O,M
O,M `iee00@T.f(e 1 , ,enn):
@T.f(e1,...,e ) :T 
T’n+1
n+1

Prof. Aiken CS 143 Lecture 9 69 Prof. Aiken CS 143 Lecture 9 70

The Method Environment More Environments

• The method environment must be added to all • For some cases involving SELF_TYPE, we need
rules to know the class in which an expression
• In most cases, M is passed down but not appears
actually used • The full type environment for COOL:
– Only the dispatch rules use M – A mapping O giving types to object id’s
– A mapping M giving types to methods
O,M ` e1 : Int – The current class C
O,M ` e2 : Int [Add]

O,M ` e1  e2 : Int
Prof. Aiken CS 143 Lecture 9 71 Prof. Aiken CS 143 Lecture 9 72

12
Sentences Type Systems

The form of a sentence in the logic is • The rules in this lecture are COOL-specific
– More info on rules for self next time
O,M,C ` e :T – Other languages have very different rules
• General themes
– Type rules are defined on the structure of
Example: expressions
O,M,C ` e1 : int – Types of variables are modeled by an environment
O,M,C ` e2 : int [Add]
• Warning: Type rules are very compact!
O,M,C ` e1  e2 : int
Prof. Aiken CS 143 Lecture 9 73 Prof. Aiken CS 143 Lecture 9 74

One-Pass Type Checking Implementing Type Systems

• COOL type checking can be implemented in a O,M,C ` e1 : Int


single traversal over the AST
O,M,C ` e2 : Int [Add]

• Type environment is passed down the tree O,M,C ` e1  e2 : Int


– From parent to child TypeCheck(Environment, e1 + e2) = {
T1 = TypeCheck(Environment, e1);
• Types are passed up the tree T2 = TypeCheck(Environment, e2);
– From child to parent Check T1 == T2 == Int;
return Int; }
Prof. Aiken CS 143 Lecture 9 75 Prof. Aiken CS 143 Lecture 9 76

13

You might also like