Professional Documents
Culture Documents
Compilers
Source
Macro Expanded Compiler or obj
Code
(with macro)
Processor Code Assembler
1
Terminology
Statement (敘述)
Declaration, assignment containing expression (運算式)
Grammar (文法)
A set of rules specify the form of legal statements
Syntax (語法) vs. Semantics (語意)
Example: assuming I, J, K:integer and X,Y:float
I:=J+K vs. I:= X+Y
Compilation (編譯)
Matching statements written by the programmer to
structures defined by the grammar and generating the
appropriate object code.
2
Basic Compiler
Lexical analysis - scanner
Scanning the source statement, recognizing and
classifying the various tokens
Syntactic analysis - parser
Recognizing the statement as some language construct.
Construct a parser tree (syntax tree)
Code generation – code generator
Generate assembly language codes
Generate machine codes (Object codes)
3
PROGRAM
Scanner SUM
STATS
:=
VAR
0
SUM
;
,
SUMSQ
SUMSQ
:=
,
I
READ
(
VALUE
)
;
4
Parser
Grammar: a set of rules
Backus-Naur Form (BNF)
Ex: Figure 5.2
Terminology
Define symbol ::=
Nonterminal symbols <>
Alternative symbols |
Terminal symbols
5
Simplified Pascal Grammar
6
Parser
READ(VALUE) <read> ::= READ (<id-list>)
<id-list>::= id | <id-list>,id
<assign>::= id := <exp>
SUM := 0 <exp> ::= <term> |
<exp>+<term> |
SUM := SUM + <exp>-<term>
VALUE <term>::=<factor> |
<term>*<factor> | <term> DIV
<factor>
MEAN := SUM DIV
100 <factor>::= id | int | (<exp>)
7
Syntax Tree
8
Syntax Tree for Program 5.1
9
10
Lexical Analysis
Function
Scanning the program to be compiled and recognizing
the tokens that make up the source statements.
Tokens
Tokens can be keywords, operators, identifiers, integers,
floating-point numbers, character strings, etc.
Each token is usually represented by some fixed-length
code, such as an integer, rather than as a variable-length
character string (see Figure 5.5)
Token type, Token specifier (value) (see Figure 5.6)
11
Scanner Output
Token specifier
Identifier name, integer value, (type)
Token coding scheme
Figure 5.5
12
13
Token Recognizer
By grammar
<ident> ::= <letter> | <ident> <letter> | <ident> <digit>
<letter>::= A | B | C | D | … | Z
<digit> ::= 0 | 1 | 2 | 3 | … | 9
By scanner - modeling as finite automata
Figure 5.8 (a)
14
Recognizing Identifier
Identifiers allowing
underscore (_)
Figure 5.8 (b)
15
Recognizing Identifier
16
Recognizing Integer
Allowing leading zeroes
Figure 5.8 (c)
Disallowing leading zeroes
Figure 5.8 (d)
17
Scanner - Implementation
Figure 5.10 (a)
Algorithmic code for identifier
recognition
Tabular representation of
finite automaton for Figure
5.9.
State A-Z 0-9 ;,+-*() : = .
1 2 4 5 6
2 2 2 3
3
4 4
5
6 7
7
18
Syntactic Analysis
Recognize source statements as language
constructs or build the parse tree for the
statements.
Bottom-up
Operator-precedence parsing
Shift-reduce parsing
LR(0) parsing
LR(1) parsing
SLR(1) parsing
LALR(1) parsing
Top-down
Recursive-descent parsing
LL(1) parsing
19
Operator-Precedence Parsing
Operator
Any terminal symbol (or any token)
Precedence
* »+
+«*
Operator-precedence
Precedence relations between operators
20
Precedence Matrix for the Fig. 5.2
21
Operator-Precedence Parse Example
BEGIN READ ( VALUE ) ;
22
Operator-Precedence Parse Example
23
Operator-Precedence Parse Example
24
Operator-Precedence Parse Example
25
Operator-Precedence Parsing
Bottom-up parsing
Generating precedence matrix
Aho et al. (1988)
26
Shift-reduce Parsing with Stack
Figure 5.14
27
Recursive-Descent Parsing
Each nonterminal symbol in the grammar is
associated with a procedure.
<read> ::= READ (<id-list>)
<stmt> ::= <assign> | <read> | <write> | <for>
Left recursion
<dec-list> ::= <dec> | <dec-list>;<dec>
Modification
<dec-list> ::= <dec> {;<dec>}
28
Recursive-Descent Parsing (cont’d.)
29
Recursive-Descent Parsing of READ
30
Recursive-Descent Parsing of IDLIST
31
Recursive-Descent Parsing (cont’d.)
32
Recursive-Descent Parsing of ASSIGN
33
Recursive-Descent Parsing of EXP
34
Recursive-Descent Parsing of TERM
35
Recursive-Descent Parsing of FACTOR
36
Recursive-Descent Parsing (cont’d.)
37
Recursive-Descent Parsing (cont’d.)
38
Recursive-Descent Parsing (cont’d.)
39
Code Generation
Add S(id) to LIST and LISTCOUNT++
40
Code Generation (cont’d.)
41
Code Generation (cont’d.)
42
Code Generation (cont’d.)
43
Code Generation (cont’d.)
46
Code Generation (cont’d.) 9 and 10
S(<exp>) := S(<term>)
S(<exp>) <> rA =S(MEAN)
47
Code Generation (cont’d.) 10
48
Code Generation (cont’d.) 10
S(<exp>2) <> rA
Call GETA(<exp>2)
Generate [SUB S(MEAN)=T2]
S(<exp>1) := rA
REGA := <exp>1
49
Code Generation (cont’d.) 11
S(<term>) := S(SUMSQ)
S(<term>) <> rA
S(<term>) := S(MEAN)
50
Code Generation (cont’d.) 11
S(<term>2) <> rA
S(<factor>) <> rA
Call GETA(MEAN)
Generate [MUL MEAN]
S(MEAN) := rA
REGA:= MEAN
51
Code Generation (cont’d.) 11
S(<term>2) <> rA
Call GETA(SUMSQ)
Generate [DIV #100]
S(<term>1) := rA
REGA := <term>1
52
Code Generation (cont’d.) 12
S(<factor>) := S(SUMSQ)
S(<factor>) := S(MEAN)
S(<factor>) := S(MEAN)
S(<factor>) := S(#100)
53
Code Generation (cont’d.) REGA <> NULL = SUMSQ
S(MEAN) <> rA = S(SUMSQ)
Generate [STA T1]
S(SUMSQ) = T1
Generate [LDA MEAN]
S(MEAN) := rA
REGA := MEAN
REGA = NULL
Generate [LDA SUMSQ]
S(SUMSQ) := rA
REGA := SUMSQ
REGA <> NULL = MEAN
S(SUMSQ) <> rA = S(MEAN)
Generate [STA T2]
S(MEAN) = T2
Generate [LDA S(SUMSQ)=T1]
S(SUMSQ) := rA
REGA := SUMSQ 54
Code Generation (cont’d.)
55
Code Generation (cont’d.) 1, 2, and 3
56
Code Generation (cont’d.) 4, 5, and 7
57
8, 14, 15
58
Code Generation (cont’d.) 16 and 17
59
Code Generation (cont’d.)
60
Code Generation (cont’d.)
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
3*(6-1)=3*5=15
76
77
78
79
80
81
82
83
Machine-Independent Compiler Features
84
Machine-Independent Compiler Features
Storage Allocation
Static Allocation: cannot be used for recursive call
Dynamic Allocation: Activation Record
85
86
87
88
89
90
91
92
P-Code Compliers
93
Compiler-Compilers
94