You are on page 1of 25

COMP

412
FALL 2015

Parsing V

LL(1) Parsing, start of Bo2om-up Parsing


Comp 412
source
code

IR
Front End


OpMmizer

target

IR
Back End

code

Copyright 2015, Keith D. Cooper & Linda Torczon, all rights reserved.
Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these
materials for their personal use.
Faculty from other educaMonal insMtuMons may use these materials for nonprot educaMonal purposes,
provided this copyright noMce is preserved.

Chapter 3 in EaC2e

PredicMve Parsing

Review from last lecture

Given a grammar that has the LL(1) property


We can write a simple rouMne to recognize an instance of each LHS
Code is paTerned, simple, & fast
Consider A 1 | 2 | 3, with

FIRST+(Ai) FIRST+ (Aj) = if i j

/* nd an A */
if (current_word FIRST+(A1))
nd a 1 and return true
else if (current_word FIRST+(A2))
nd a 2 and return true
else if (current_word FIRST+(A3))
nd a 3 and return true
else
report an error and return false

COMP 412, Fall 2015

Grammars that have the LL(1)


property are called predicBve
grammars because the parser can
predict the correct expansion at
each point in the parse.
Parsers that capitalize on the LL(1)
property are called predicBve
parsers.
One kind of predicMve parser is the
recursive descent parser.

Of course, there is more detail to nd a i


typically a recursive call to another small
rouMne (see pp. 108111 in EaC2e)

Recursive Descent Parsing

Review from last lecture

Recall the expression grammar, aHer transformaBon


0

Goal

Expr

Expr

Term Expr

Expr

+ Term Expr

| - Term Expr

Term

Factor Term

Term

* Factor Term

| / Factor Term

Factor ( Expr )

10

| number

11

| id

COMP 412, Fall 2015

This grammar leads to a parser that


has six mutually recursive rouMnes:
1. Goal
2. Expr
3. EPrime
4. Term
5. TPrime
6. Factor
Each rouMne recognizes an RHS for
that NT.
The term descent refers to the
direcMon in which the parse tree is
built.

Recursive Descent Parsing

Review from last lecture

A couple of rouBnes from the expression parser


Goal( )
token next_token( );
if (Expr( ) = true & token = EOF)
then next compilaIon step;
else
report syntax error;
return false;

Expr( )
if (Term( ) = false)
then return false;
else return Eprime( );
looking for number, idenMer, or (,
found token instead, or failed to nd
Expr or ) amer (

COMP 412, Fall 2015

Factor( )
if (token = number ) then
token next_token( );
return true;
else if (token = idenMer ) then
token next_token( );
return true;
else if (token = lparen )
token next_token( );
if (Expr( ) = true & token = rparen ) then
token next_token( );
return true;
// fall out of if statement
report syntax error;
return false;

EPrime, Term, & TPrime follow the
same basic lines (Figure 3.10, EaC2e)
4

112 CHAPTER 3 Parsers

Recursive Descent
Page 111 in EaC2e
sketches a recursive
descent parser for the
right-recursive version
of the classic expression
grammar.
One rouMne per NT
Check each RHS by
checking each symbol
Includes -producMons

Your lab 2 parsers are


not much more complex
than the example.

Main( )
/* Goal Expr */
word NextWord( );
if (Expr( ))
then if (word = eof )
then report success;
else Fail( );

TPrime( )

/* Term Factor Term


Review from last
lecture
/* Term Factor Term

Fail( )
report syntax error;
attempt error recovery or exit;
Expr( )
/* Expr Term Expr */
if ( Term( ) )
then return EPrime( );
else Fail();
EPrime( )
/* Expr + Term Expr */
/* Expr - Term Expr */
if (word = + or word = - )
then begin;
word NextWord( );
if ( Term() )
then return EPrime( );
else Fail();
end;
else if (word = ) or word = eof)
/* Expr */
then return true;
else Fail();
Term( )
/* Term Factor Term */
if ( Factor( ) )
then return TPrime( );
else Fail();

COMP 412, Fall 2015

if (word = or word = )
then begin;
word NextWord( );
if ( Factor( ) )
then return TPrime( );
else Fail();
end;
else if (word = + or word = - or
word = ) or word = eof)
/* Term */
then return true;
else Fail();
Factor( )
/* Factor ( Expr ) */
if (word = ( ) then begin;
word NextWord( );
if (not Expr( ) )
then Fail();
if (word = ) )
then Fail();
word NextWord( );
return true;
end;
/* Factor num */
/* Factor name */
else if (word = num or
word = name )
then begin;
word NextWord( );
return true;
end;
else Fail();

FIGURE 3.12

*/
*/

Recursive-Descent Parser for Expressions

Top-Down Recursive Descent Parser


At this point, you have enough informaBon to build a top-down
recursive-descent parser
Need a right-recursive grammar that meets the LL(1) condiMon
Can use lem-factoring to eliminate common prexes
Can transform direct lem recursion into right recursion
Need a general algorithm to handle indirect lem recursion

Need to build FIRST, FOLLOW, and FIRST + sets


Emit a rouMne for each non-terminal
Nest of if-then-else statements to check alternate rhss
Each returns true on success and throws an error on false
Simple, working (perhaps ugly) code

Could automaMcally construct a recursive-descent parser


Can we do beTer?

COMP 412, Fall 2015

I dont know of a system that does this

ImplemenMng a Recursive Descent Parser


A nest of if-then else statements may be slow
A good case statement would be an improvement

Python?

See EaC2e, 7.8.3


Encode with computaMon rather than repeated branches

Order the cases by expected frequency, to drop average cost


What about encoding the decisions in a table?
Replace if then else or case statement with an address computaMon
Branches are slow and disrupMve
Interpret the table with a skeleton parser, as we did in scanning

COMP 412, Fall 2015

Building Table-Driven Top-down Parsers


Strategy
Encode knowledge in a table
Use a standard skeleton parser to
interpret the table
Example
The non-terminal Factor has 3 expansions

Non-terminal
Symbols

Factor

Goal

Expr

Expr

Term Expr

Expr

+ Term Expr

- Term Expr

Term

Factor Term

Term

* Factor Term

/ Factor Term

( Expr )

10

number

11

idenMer

( Expr ) or IdenMer or Number

Table might look like:

Terminal Symbols

Factor

Id.

Num.

EOF

11

10

Cannot expand Factor into an


COMP 412, Fall 2015 operator error

Expand Factor by rule 10


with input number
8

Building Top-down Parsers


Building the complete table
Need a row for every NT & a column for every T

COMP 412, Fall 2015

9 *

LL(1) Table for the Expression Grammar


+

id

num

EOF

Goal

Expr

Expr

Term

Term

11

10

Row we built
earlier

Factor

Table diers from Figure 3.11 on page 112 in EaC2e because


the order of non-terminals (columns) is diferent.
COMP 412, Fall 2015

10

Building Top-down Parsers


Building the complete table
Need a row for every NT & a column for every T
Need an interpreter for the table (skeleton parser)

COMP 412, Fall 2015

11

LL(1) Skeleton Parser


word NextWord() // IniIal condiIons, including
push EOF onto Stack // a stack to track local goals
push the start symbol, S, onto Stack
TOS top of Stack
loop forever
if TOS = EOF and word = EOF then
break & report success // exit on success
else if TOS is a terminal then
if TOS matches word then
pop Stack
// recognized TOS
word NextWord()
else report error looking for TOS // error exit
else

// TOS is a non-terminal
if TABLE[TOS,word] is A B1B2Bk then
pop Stack // get rid of A
push Bk, Bk-1, , B1 // in that order
else break & report error expanding TOS
TOS top of Stack
COMP 412, Fall 2015

12

Building Top-down Parsers


Building the complete table
Need a row for every NT & a column for every T
Need a table-driven interpreter for the table
Need an algorithm to build the table
Filling in TABLE[X,y], X NT, y T
1. entry is the rule X , if y FIRST+(X )
2. entry is error if rule 1 does not dene
If any entry has more than one rule, G is not LL(1)

Incrementally tests the LL(1)


criterion on each NT.
An ecient way to determine
if a grammar is LL(1)


This algorithm is the LL(1) table construcMon algorithm

In Lab 2, you will build a recursive descent parser for a modied form of BNF and
build LL(1) tables for the grammars that are LL(1).
COMP 412, Fall 2015

13

Recap of Top-down Parsing


Top-down parsers build syntax tree from root to leaves
Lem-recursion causes non-terminaMon in top-down parsers
TransformaMon to eliminate lem recursion
TransformaMon to eliminate common prexes in right recursion

FIRST, FIRST+, & FOLLOW sets + LL(1) condiMon


LL(1) uses lem-to-right scan of the input, lemmost derivaMon of the

sentence, and 1 word lookahead

LL(1) condiMon means grammar works for predicMve parsing

Given an LL(1) grammar, we can


Build a recursive descent parser
Build a table-driven LL(1) parser

LL(1) parser doesnt build the parse tree


Keeps lower fringe of parMally complete tree on the stack
COMP 412, Fall 2015

14

Parsing Techniques
Top-down parsers (LL(1), recursive descent)
Start at the root of the parse tree and grow toward leaves
Pick a producMon & try to match the input
G
Bad pick may need to backtrack
E
Some grammars are backtrack-free
E

BoTom-up parsers (LR(1), operator precedence)


Start at the leaves and grow toward root
As input is consumed, encode possibiliMes
in an internal state
Start in a state valid for legal rst tokens
We can make the process determinisMc

COMP 412, Fall 2015

<id,x>

<num,2>

F
<id,y>

Parse tree for x + 2 * y

BoTom-up parsers can recognize a strictly larger


class of grammars than can top-down parsers.

15

BoTom-up Parsing (deniMons)


The point of parsing is to construct a deriva@on
A derivaMon consists of a series of rewrite steps
S 0 1 2 n1 n sentence

Each i is a sentenMal form

If contains only terminal symbols, is a sentence in L(G)


If contains 1 or more non-terminals, is a sentenBal form

To get i from i1, expand some NT A i1 by using A


Replace the occurrence of A i1 with to get i
In a lemmost derivaMon, it would be the rst NT A i1

A leA-senten@al form occurs in a le^most derivaMon


A right-senten@al form occurs in a rightmost derivaMon
BoEom-up parsers build a rightmost deriva@on in reverse
COMP 412, Fall 2015

16

BoTom-up Parsing (deniMons)


A boTom-up parser builds a derivaMon by working from
the input sentence back toward the start symbol S
S 0 1 2 n1 n sentence

boTom-up

To reduce i to i1 match some rhs against i then replace with its


corresponding lhs, A.


(assuming the reducIon is A)
In terms of the parse tree, it works from leaves to root

Nodes with no parent in a parMal tree form its upper fringe


Since each replacement of with A shrinks the upper fringe,

we call it a reducBon.
Rightmost derivaMon in reverse processes words le^ to right

The parse tree need not be built, it can be simulated


|parse tree nodes | = |terminal symbols | + |reducIons |
Shrinks
the Fuall
pper
fringe implies that the terminals are all instanMated, at least implicitly.
COMP
412,
2015

17

Finding ReducMons
Consider the grammar
0

Goal

2
3

a A B e

SentenIal
Form

Next ReducIon
Prodn
Posn

A b c

abbcde

| b

a A bcde

a A de

a A B e

Goal

And the input string abbcde

The trick is scanning the input and nding the next reducMon.
The mechanism for doing this must be ecient.
The reducMons are obvious from the derivaMon. Of course, building the
derivaMon is not a pracMcal way to nd it.
COMP 412, Fall 2015

18

Finding ReducMons
Consider the grammar
0

Goal

2
3

a A B e

SentenIal
Form

Next ReducIon
Prodn
Posn

A b c

abbcde

| b

a A bcde

a A de

a A B e

Goal

And the input string abbcde

The trick is scanning the input and nding the next reducMon
The mechanism for doing this must be ecient
PosiIon species where the right end of
occurs in the current sentenIal form.

While the process of nding the next reducMon appears to be almost oracular,
it can be automated in an ecient way for a large class of grammars.
COMP 412, Fall 2015

19

Finding ReducMons (Handles)


The parser nds a substring of the trees fronMer that derives from
expansion by A in the previous step in the rightmost derivaIon
Informally, we call this substring a handle
Formally,
A handle of a right-sentenMal form is a pair <A,k> where
A P and k is the posiMon in of s rightmost symbol.

If <A,k> is a handle, then replacing at k with A produces the right


sentenMal form from which is derived in the rightmost derivaMon.
Because is a right-sentenMal form, the substring to the right of a handle
contains only terminal symbols
the parser doesnt need to scan (much) past the handle
Handles are the most mysIfying aspect of bo2om-up, shi^-reduce
parsers. It usually takes a couple lectures
COMP 412, Fall 2015

20

Using Handles: a BoTom-up Parser


As with the top-down parser, we will introduce a stack to hold the upper
fringe of the parMally completed parse tree.
A simple shiH-reduce parser:
push INVALID
word NextWord( )
repeat unIl (top of stack = Goal and word = EOF)
if the top of the stack is a handle A
then // reduce to A
pop | | symbols o the stack
push A onto the stack
else if (word EOF)
then // shi^
push word
word NextWord( )
else // need to shi^, but out of input
report an error
This p4arser
s someMmes
called a handle-pruning parser.
COMP
12, Fiall
2015

What happens on an error?


Parser fails to nd a handle

Thus, it keeps shiming


Eventually, it consumes all
input

This parser reads all input


before reporMng an error, not
a desirable property.
To x this issue, the parser
must recognize the failure to
nd a handle earlier.
To make shim-reduce parsers
pracMcal, we need good error
localizaMon in the handle-
nding process.

21

Example
0 Goal

Expr

1 Expr

Expr + Term

| Expr - Term

| Term

4 Term

Term * Factor

| Term / Factor

| Factor

( Expr )
7 Factor
8

| number

| id

BoTom-up parsers work with either


lem-recursive or right-recursive
grammars.
The obvious lem-recursive grammar is
lem associaMve.
I prefer the obvious lem-recursive
grammar because its associaMvity
matches the standard rules that we
were all taught as children.
The examples will use the lem-
recursive, lem-associaMve grammar.

A simple leA-recursive form of the


classic expression grammar

COMP 412, Fall 2015

22

Example
0 Goal

Expr

Prodn SentenIal Form

1 Expr

Expr + Term

Goal

| Expr - Term

Expr

| Term

Expr - Term

4 Term

Term * Factor

Expr - Term * Factor

| Term / Factor

Expr - Term * <id,y>

| Factor

Expr - Factor * <id,y>

( Expr )
7 Factor

Expr - <num,2> * <id,y>

| number

Term - <num,2> * <id,y>

| id

Factor - <num,2> * <id,y>

9
<id,x> - <num,2> * <id,y>
A simple leA-recursive form of the
classic expression grammar derivaMon
Rightmost deriva@on of x 2 * y

COMP 412, Fall 2015

23

Example
0 Goal

Expr

Prodn SentenIal Form

1 Expr

Expr + Term

Goal

| Expr - Term

Expr

| Term

Expr - Term

4 Term

Term * Factor

Expr - Term * Factor

| Term / Factor

Expr - Term * <id,y>

| Factor

Expr - Factor * <id,y>

( Expr )
7 Factor

Expr - <num,2> * <id,y>

| number

Term - <num,2> * <id,y>

| id

Factor - <num,2> * <id,y>

A simple leA-recursive form of the


classic expression grammar

<id,x> - <num,2> * <id,y>

parse

Handles for rightmost deriva@on of x 2 * y

COMP 412, Fall 2015

24

Handles
At this point, handles appear mysterious
Dont Panic: handles are mysterious
Next lecture will focus on handles
If it were easy, it would not have taken Knuth to invent it!

Handles can be discovered in an easy & systemaMc way


It just takes another lecture or so to get to that point

If we had a handle-generaMng oracle, boTom-up parsing would be easy


We will show how to derive that oracle
As you might guess, the answer lies in pracMcal applicaMon of material from
COMP 481

Next Class
Handles, handles, and more handles
COMP 412, Fall 2015

25

You might also like