You are on page 1of 9

1

7. Compilers

2/17/2018

John Roberts

2
Overview
• Compilation Process

3
Compilation Process

Source
Program Parsing (Grammars)
Lexical Analysis recursive descent
(stream of Tokens processing
characters) (stream of lexical units)
Abstract Syntax Tree (AST)

Constraining
Code Generation Interpreter
type check, decorate AST,
bytecodes, runtime stack (Virtual Machine)
symbol table Bytecode Program
Decorated AST
4
Extended Example

• Go through compilation process and look at artifacts at


each stage

• General introduction - over the next few weeks we’ll be


looking at individual stages in greater depth (starting with
lexical analysis today)

5
Source Program

Source
Program
Lexical Analysis
(stream of
characters)

1 program { int i int j


2 i = i + j + 7
3 j = write(i)
4 }

6
Lexical Analysis

Lexical Analysis
Tokens
(stream of lexical units)

• Takes the source code (may have been modified by a


preprocessor), and breaks it into a series of tokens

• Removes whitespace

• Removes comments
7
Tokens

Parsing (Grammars)
recursive descent
Tokens processing
(stream of lexical units)

• Tokens are strings with an assigned meaning

program leftBrace intType <id:i> intType <id:j>

<id:i> assign <id:i> plus <id:j> plus <int:7>

<id:j> assign <id:write> leftParen <id:i> rightParen

rightBrace

1 program { int i int j 8


2 i = i + j + 7
Tokens 3
4 }
j = write(i)

READLINE program { int i int j

program left: 0, right: 6

{ left: 8, right: 8

int left: 10, right: 12

i left: 14, right: 14

int left: 16, right: 18

j left: 20, right: 20

1 program { int i int j 9


2 i = i + j + 7
Tokens 3
4 }
j = write(i)

READLINE i=i+j+7

i left: 3, right: 3

= left: 5, right: 5

i left: 7, right: 7

+ left: 9, right: 9

j left: 11, right: 11

+ left: 13, right: 13

7 left: 15, right: 15


1 program { int i int j 10
2 i = i + j + 7
Tokens 3
4 }
j = write(i)

READLINE j = write( i )

j left: 3, right: 3

= left: 5, right: 5

write left: 7, right: 11

( left: 12, right: 12

i left: 13, right: 13

) left: 14, right: 14

1 program { int i int j 11


2 i = i + j + 7
Tokens 3
4 }
j = write(i)

READLINE }

} left: 0, right: 0

12
Parsing (Syntax Analysis)

Parsing (Grammars)
recursive descent
processing
Abstract Syntax Tree (AST)

• Parsing is the process of analyzing a stream of symbols


to determine if it conforms to the grammar for the
language

• Builds the Abstract Syntax Tree (AST)


13 From wiki https://en.wikipedia.org/wiki/Abstract_syntax_tree
Abstract Syntax Tree

Constraining
type check, decorate AST,
symbol table
Abstract Syntax Tree (AST)

• The Abstract Syntax Tree is a tree representation of the


abstract syntactic structure of the source code

• “abstract” because it does not represent every detail


appearing in the real syntax (e.g. grouping parentheses
will be implied in the tree structure)

14 One way to parse/construct AST - Next slide has our implementation


Abstract Syntax Tree
program

block

decl decl assign assign

int i int j i + j call

+ 7 write i

i j

1 program { int i int j 15


2 i = i + j + 7
Abstract Syntax Tree 3
4 }
j = write(i)

1: Program

2: Block

5: Decl 8: Decl 10: Assign 17: Assign

3: IntType 4: Id: i 6: IntType 7: Id: j 9: Id: i 14: AddOp: + 16: Id: j 19: Call

12: AddOp + 15: Int: 7 18: Id: write 20: Id: i

11: Id: i 13: Id: j


16
Constraining

Constraining
type check, decorate AST,
symbol table
Decorated AST

• Type checking, decorating the AST with additional information,


building a symbol table

• The symbol table stores information about the identifiers found in


the program’s source code, and information relating to declaration
or appearance

• AST nodes that mentions symbols are enriched with a reference


to the identifier’s entry in the symbol table

17 More on symbol tables and intrinsic trees later (write tree not shown)
Decorated AST
program
Symbol Table
Intrinsic Tree
Entry

block

Put 1 Put 2
decl decl assign assign

int i int j i + j call


Int
Int Int Int
Get 1 Tree Get 2
Tree Tree Tree

+ 7 write i
Int Int Int Write Int
Tree Tree Tree Tree Get 1
Tree

i j
decl
Get 1 Get 2

int <<int>>

1 program { int i int j 18


2 i = i + j + 7
Decorated AST 3
4 }
j = write(i)

1: Program Dec Address Label

2: Block

5: Decl i 8: Decl j 10: Assign 17: Assign

3: IntType 4: Id: i 6: IntType 7: Id: j 9: Id: i 14: AddOp: + 16: Id: j 19: Call

0 28 1 28 5 28 8 28

12: AddOp + 15: Int: 7 18: Id: write 20: Id: i

28 28 35 5
28: Decl

30: Id: 11: Id: i 13: Id: j


29: IntType
<<int>>
28 5 8

35: FunctionDecl write

36: IntType 24: Id: write 40: Formals 41: Block

28

37: Decl

39: Id:
38: IntType
dummyFormal
28
19
Code Generation

Code Generation
bytecodes, runtime stack
Bytecode Program

• Walk the Decorated AST (DAST)

• Generates bytecodes

• Track how each code affects the runtime stack - whenever


we determine a frame offset for a variable declaration, set
the address field with the offset value (more later…)

20 Note two occurrences of add = 0 - Scope is important!!


program
Intrinsic Tree Address/Label

block

decl decl assign assign

int i int j i + j call

addr = 0 addr = 1

+ 7 write i
Int
Tree

i j
decl

int <<int>>
Write
Tree

label = write
functionDecl

int write formals block

decl

int dummyFormal

addr = 0

1 program { int i int j


2 i = i + j + 7
21 Discuss what an activation record is
Code Generation 3
4 }
j = write(i)

Frame (Activation Record) Comments

00 load initial values of local variables (i and j)

i is first, j is second

generate code for i = i + j + 7

000 load i

0000 load j

000 add (sum is on top of frame)

0007 load 7

007 add

70 store into i
1 program { int i int j
2 i = i + j + 7
22 Do we need to refresh on stack frames? Covered in 415…

Code Generation New stack frame on fn invocation, arg copies pushed to stack (by value!
3 j = write(i)
4 }

Frame (Activation Record) Comments what’s pushed for by ref?)

generate code for j = write( i )


Stack frame popped, return value placed on stack
707 load i

ARGS 1 — one argument for function

Call write

70 | 7 Branch to write function

starts new frame with arg(s) on top

(7 is only arg, so its in first slot in new frame)


load arg, write to output, written value remains on
70 | 77
stack
707 return with value written as return value

1 program { int i int j 23


2 i = i + j + 7
Code Generation 3
4 }
j = write(i)

Frame (Activation Record) Comments

77 store into j

— clear local variables and halt

1 program { int i int j 24


2 i = i + j + 7
Byte Codes 3
4 }
j = write(i)

Byte Code Comments

GOTO start<<1>>

LABEL read

READ Read function

RETURN

LABEL write

LOAD 0 dummyFormal Write function

WRITE

RETURN
1 program { int i int j 25
2 i = i + j + 7
Byte Codes 3
4 }
j = write(i)

Byte Code Comments

LABEL start<<1>> i=i+j+7

LIT 0 i init i to 0; i is offset 0 in current frame

LIT 0 j init j to 0; j is offset 0 in current frame

LOAD 0 i load variable at offset 0 (i)

LOAD 1 j

BOP +

STORE 0 i store into i

1 program { int i int j 26


2 i = i + j + 7
Byte Codes 3
4 }
j = write(i)

Byte Code Comments

LOAD 0 i j = write( i )

ARGS 1

CALL write

STORE 1 j

POP 2 remove local variables from the stack

HALT

27 Tokens, Symbols, Tree, Tables, Lists, etc.


Question

• If we were building a system to automate this (i.e. a


compiler), what entities should we design? What are the
objects?

You might also like