You are on page 1of 6

Computing FIRST and FOLLOW Functions for

Feature-Theoretic Grammars
Arturo Trujillo
Computer Laboratory
University of Cambridge
Cambridge CB2 3QG, England
iat@cl.cam.ac.uk

Abstract 1.1 FIRST and FOLLOW

This paper describes an algorithm for the com- In order to make more ecient parsers, it is
putation of FIRST and FOLLOW sets for use sometimes necessary to preprocess (compile) a
with feature-theoretic grammars, in which the grammar to extract from it top-down informa-
value of the sets consists of pairs of feature- tion to guide the search during analysis. The
theoretic categories. The algorithm preserves rst step in the preprocessing stage of sev-
as much information from the grammars as eral compilation algorithms requires the solu-
possible, using negative restriction to de ne tion of two functions normally called FIRST
equivalence classes. Addition of a simple data and FOLLOW. Intuitively, F IRST (X ) gives
structure leads to an order of magnitude im- us the terminal symbols that may appear in
provement in execution time over a naive im- initial position in substrings derived from cate-
plementation. gory X . F OLLOW (X ) gives us the terminals
which may immediately follow a substring of
category X . For example, in the grammar S
! NP VP; NP ! det noun; VP ! vtra NP,
1 Introduction we get:
The need for ecient parsing is a constant one ( )=
F IRST S ( ) = fdetg;
F IRST N P
in Natural Language Processing. With the ad- ( )=
F IRST V P fvtrag;
vent of feature-theoretic grammars, many of ( )=
F OLLOW N P $
fvtra; g;
the optimization techniques that were applica- F OLLOW (S ) = F OLLOW (V P ) = f$g ($
ble to Context Free (CF) grammars have re- marks end of input)
quired modi cation. For instance, a number
of algorithms used to extract parsing tables These two functions are important in a large
from CF grammars have involved discarding range of algorithms used for constructing ef-
information which otherwise would have con- cient parsers. For example the LR-parser
strained the parsing process, Briscoe and Car- construction algorithm given in Aho et al.
roll (1993). This paper describes an extension (1986:232) uses FIRST to compute item clo-
to an algorithm that operates over CF gram- sure values. Another example is the compu-
mar to make it applicable to feature-theoretic tation of the 6  relation which is used in the
ones. One advantage of the extended algo- construction of generalized left-corner parsers,
rithm is that it preserves as much of the in- Nederhof (1993); this relation is e ectively an
formation in the grammar as possible. extension of the function FIRST.

1
2 Computing FIRST and of the simplest ways of showing this is where
FOLLOW a grammar accumulates the orthographic rep-
resentation of its terminals as one of its fea-
We propose an algorithm for the computa- ture values. It is not dicult to see how one
tion of FIRST values which handles feature- can have an in nite number of NPs in such a
theoretic grammars without having to extract grammar:
a CF backbone from them; the approach is eas- NP[orth: the dog]
ily adapted to compute FOLLOW values too. NP[orth: the fat dog]
An improvement to the algorithm is presented NP[orth: the big fat dog], etc.
towards the end of the paper. Before describ- This means that F IRST (N P [orth: the dog])
ing the algorithm, we give a well known proce- would have a di erent value to F IRST (N P [
dure for computing FIRST for CF grammars orth: the fat dog]) even though they share
(taken from Aho et al. (1986:189), where  is the same leftmost terminal. That is, the fea-
the empty string): ture structure for the substring \det adj noun"
\To compute F I RST (X ) for all grammar sym- will be di erent to that for \det noun" even
bols X , apply the following rules until no more
terminals or  can be added to any FIRST set. though they have the same starting symbol.
This point is important since similar situations
1. If X is terminal, then F I RST (X ) is X . arise with the subcategorization frame of verbs
2. If ! is a production, then add to
and the semantic value of categories in contem-
X 
F I RST (X ).

porary theories of grammar, Pollard and Sag
(1987). Without modi cation, the algorithm
3. If X is nonterminal and X ! Y1 Y2 :::Y is k
above would not terminate.
a production, then place a in F I RST (X ) if
for some i, a is in F I RST (Y ), and  is in The solution to this problem is to de ne a
nite number of equivalence classes into which
i

all of F I RST

(Y1) ... F I RST (Y 1 ); that is,
i

Y1 :::Y 1 =) . If  is in F I RST (Y ) for all


i j the in nite number of nonterminals may be
j = 1, 2,..., k, then add  to F I RST (X ). sorted. These classes may be established in
a number of ways; the one we have adopted is
Now, we can compute FIRST for any string X1 that presented by Harrison and Ellison (1992)
X2:::X as follows. Add to F I RST (X1X2 :::X )
n n
which builds on the work of Shieber (1985): it
all of the non- symbols of F I RST (X1). Also introduces the notion of a negative restrictor
add the non- symbols of F I RST (X2) if  is in to de ne equivalence classes. In this solution
F I RST (X1), the non- symbols of F I RST (X3) if
a prede ned portion of a category (a speci c
 is in both F I RST (X1) and F I RST (X2), and so
set of paths) is discarded when determining
on. Finally, add  to F I RST (X1X2:::X ) if, for
n
whether a category belongs to an equivalence
all i, F I RST (X ) contains ."
i
class or not. For instance, in the above ex-
This algorithm will form the basis of our pro- ample we could de ne the negative restrictor
posal. to be forthg. Applying this negative restrictor
to each of the three NPs above would discard
the information in the `orth' feature to give us
3 Compiling Feature- three equivalent nonterminals. It is clear that
Theoretic Grammars the restrictor must be such that it discards fea-
tures which in one way or another give rise to
3.1 Equivalence Classes an in nite number of nonterminals. Unfortu-
nately, termination is not guaranteed for all
The main reason why the above algorithm can- restrictors, and furthermore, the best restric-
not be used with feature-theoretic grammars is tor cannot be chosen automatically since it de-
that in general the number of possible nonter- pends on the amount of grammatical informa-
minals allowed by the grammar is in nite. One tion that is to be preserved. Thus, selection

2
of an appropriate restrictor will depend on the between it and the daughters within the rule
particular grammar or system used. in which it occurs. Finally, we assume that
any category in a rule which can unify with
3.2 Value Sharing a lexical category is marked in some way, say
by using the feature-value pair `ter: +', and
Another problem with the algorithm above is that non-terminal categories must unify with
that reentrancies between a category and its the mother of some rule in the grammar; the
FIRST and FOLLOW values are not preserved latter condition is necessary because the algo-
in the solution to these functions; this is be- rithm only computes the solution of FIRST for
cause the algorithm assumes atomic symbols lexical categories or for categories that occur as
and these cannot encode explicitly shared in- mothers.
formation between categories. For example, In computing FIRST we iterate over all the
consider the following naive grammar: rules in the grammar, treating the mother of
S) NP[agr: X] VP[agr: X] each rule as the category for which we are try-
VP[agr: X] ) Vint[agr: X] ing to nd a FIRST value. Throughout each
NP[agr: X] ) Det N[agr: X] iteration, uni cation of a daughter with the lhs
of an element of FIRST results in a modi ed
We would like the solution of F OLLOW (N ) rule and a modi ed pair in which bindings be-
to include the binding of the `agr' feature tween the mother category and the rhs of the
such that the value of FOLLOW resembled: pair are established. The modi ed mother and
F OLLOW (N [agr : X ]) = V int[agr : X ]. But rhs are then used to construct the pair which
the algorithm above, even with a restrictor, is added to FIRST. For instance, given rule
would not preserve such a binding since the X ! Y and pair (L; R), we unify Y and L to
addition of a new category to F OLLOW (N ) give X 0 ! Y 0 and (L0; R0); from these the pair
is done independently of the bindings between (X 0; R0) is constructed and added to FIRST.
the new category and N . The algorithm assumes an operation +
which constructs a set S 0 = S + p in the fol-
lowing way: if pair p subsumes an element a
4 The Basic Algorithm of S then S 0 = S - a + p; if p is subsumed
We propose an algorithm which, rather than by an element of S then S 0 = S ; else S 0 = S
construct a set of categories as the value of + p. It should be noted that the pairs con-
FIRST and FOLLOW, constructs a set of pairs stituting the value of FIRST can themselves
each of which represents a category and its be compared using the subsumption relation in
FIRST or FOLLOW category, with all the cor- which reentrant values are subsumed by non-
rect bindings explicitly encoded. For instance, reentrant ones, and combined using the uni -
for the above example, the pair (VP[agr: X], cation operation. Thus in the principal step
Vint[agr: X]) would be in the set representing of the algorithm, a new pair is constructed as
the value of the function FIRST. In the next described above, a restrictor is applied to it,
section the algorithm for computing FIRST is and the resulting, restricted pair is + -added
described; computation of FOLLOW proceeds to FIRST. The algorithm is as follows:
in a similar fashion. 1. Initialise F irst = fg.
4.1 Solving FIRST 2. Run through all the daughters in the
grammar. If X is pre-terminal, then
When modifying the algorithm of Section 2 F irst = F irst + (X; X )! (where
we note that each occurrence of a category in (X; X )! means apply the negative re-
the grammar is potentially distinct from ev- strictor  to the pair (X; X )).
ery other category. In addition, for each cate-
gory we need to remember all the reentrancies 3. For each rule in the grammar with mother

3
S ) NP[agr: X, slash: NULL] VP[agr: X, slash: NULL]
S ) NP[slash: NULL] NP[agr: X, slash: NULL] VP[agr: X, slash: NP]
VP[agr: X, slash: Y] ) Vtra[agr: X, ter: +] NP[slash: Y]
NP[agr: X, slash: NULL] ) Det[ter: +] N[agr: X, ter: +]
NP[slash: NP] ) 

Figure 1: Example grammar with value sharing.

X , apply steps 4 and 5 until no more a di erent value for . In our implementa-
changes are made to F irst. tion, the pair added to F irst in these situa-
tions consists of the mother category or the
4. If the rule is X ! , then F irst = string of categories and the most general cate-
F irst + (X; )!.
gory for  as de ned by the grammar, thus ef-
5. If the rule is X ! Y1::Y ::Y , then F irst =
i k
fectively ignoring any bindings that  may have
F irst + (X 0 ; a)! if (Y 0 ; a) has success- within the constructed pair. A more accurate
solution would have been to compute multiple
i

fully uni ed with an element of F irst, and


(Y10; 1):::(Y 0 1;  1) have all successfully
i
pairs with , construct their least upper bound,
and then add this to F irst. However, in our
i

and simultaneously uni ed with members


of F irst. Also, F irst = F irst + (X 0; )! implementation this solution has not proven
if (Y10; 1):::(Y 0;  ) have all successfully
k k
necessary.
and simultaneously uni ed with elements
of F irst.
6. Now, for any string of categories X1
4.2 Example
::X ::X , F irst = F irst + (X10 :::X 0 ; a)!
i n n Assuming the grammar in Fig. 1 and the neg-
if (X10 ; a) has successfully uni ed with an ative restrictor  = fslashg, the following is a
element of F irst, and a 6 . Also, for simpli ed run through the algorithm:
i = 2:::n, F irst = F irst + (X10 :::X 0 ; a)! n
 = fg
if (X 0; a) has successfully uni ed with
i
F irst

an element of F irst, a 6 , and  After processing all pre-terminal categories


(X10 ; 1):::(X 0 1;  1) have all successfully
i
F irst = f(Det; Det); (N; N ); (V tra; V tra)g
i
and simultaneously uni ed with members (obvious bindings not shown).
of F irst. Finally, F irst = F irst +  After the rst iteration F irst = f(Det; Det);
(X10 :::X 0 ; )! if (X10 ; 1):::(X 0 ;  ) have
n n n (N; N );(V tra; V tra);(V P [agr : X ]; V tra[agr :
all successfully and simultaneously uni ed X ]); (N P; Det); (N P; )g
with members of F irst. (This step may be  Since `slash' is in , any of the NPs in the
computed on demand). grammar will unify with the lhs of (N P; ) and
One observation on this algorithm is in order. hence S will have Vtra as part of its FIRST
The last action of steps 5 and 6 adds  as a value. F irst = f::;(V P [agr : X ]; V tra[agr : X ]);
possible value of FIRST for a mother category (N P; Det);(N P; ); (S; Det); (S; V tra)g
or a string of categories; such a value results
 The next iteration adds nothing and the rst
when all daughters or categories have  as their
FIRST value. Since most grammatical descrip- stage of the algorithm terminates.
tions assign a category to  (e.g. to bind onto it The second stage (step 6) is done on demand,
information necessary for correct gap thread- for example to compute state transitions for
ing), the pairs (X 0; ) or (X10 :::X 0 ; ) should n
a parsing table, in order to avoid the expense
have bindings between their two elements; this of computing FIRST for all possible substrings
creates the problem of deciding which of the of categories. For instance, to compute FIRST
s in the FIRST pairs to use, since it is possi- for the string [NP NP VP] the algorithm works
ble in principle that each of these will have as follows:

4
 F irst = f::; (V P [agr : X ]; V tra[agr : X ]); NULL] ) Det[ter: +] N[agr: X, ter: +], has to be
(N P; Det); (N P; ):::g considered only once by every rule in the gram-
 After considering the rst NP: F irst =
mar; after that, this pair cannot be involved in
f::; ([NP NP VP]; Det)g.
the construction of new values.
A simple data structure which keeps track
 Consideration of the second NP in the input of those pairs that need to be searched at any
string results in no changes to F irst, given the se- one time was added to the algorithm; the data
mantics of + , since the pair that it would have structure took the form of a list of pointers to
added, ([NP NP VP]; ), is already in F irst. active pairs in F irst, where an active pair is
 Since NPs can rewrite as  (i.e. (N P; ) one which has not been considered by the rule
is in F irst), F irst = f::; ([NP NP VP]; Det); from which it was constructed. For example,
([NP NP VP]; V tra)g. the pair (N P; Det) would be active for a com-
 Finally, ([NP NP VP]; ) may not be added since
plete iteration from the moment that the cor-
(V P; ) does not unify with any element of F irst. responding rule introduced it until that rule is
visited again during the second iteration. The
e ect of this policy is to allow each pair in
5 Improving the Search F irst to be tested against each rule exactly
once and then be excluded from subsequent
Through F irst searches; this greatly reduces the number of
pairs considered for each iteration.
If the algorithm is run as presented, each it- Using the Typed Feature Structure system
eration through the grammar rules becomes (the LKB) of Briscoe et al. (1993), we wrote
slower and slower. The reason is that, in step two grammars and tested the algorithm on
5, when searching F irst to create a new pair them. Table 1 shows the average number of
(X 0; a), every pair in F irst is considered and pairs considered for each iteration compared
uni cation of its lhs with the relevant daughter to the average number of pairs in F irst.
of X attempted. Since each iteration normally
adds pairs to F irst each iteration involves a 13 Rule Grammar 21 Rule Grammar
search through a larger and larger set; fur- Considered Total Considered Total
thermore, this search involves uni cation, and Iter. 1 3.5 3.5 8.4 8.4
in the case of a successful match, the subse- Iter. 2 7.5 10.7 9.7 18.7
quent construction and addition to F irst also Iter. 3 1.2 12.0 1.0 19.0
requires subsumption checks. All of these op- Table 1: Average number of pairs per iteration.
erations combine to make each additional ele-
ment in F irst have a strong e ect on the per- As we can see, after the rst iteration the
formance of the algorithm. We therefore need number of pairs that needs to be considered
to minimize the number of pairs searched. is less (much less for the nal iteration) than
Considering the dependencies that exist be- the total number of pairs in F irst. Similar im-
tween pairs in F irst one notices that once a provements in performance were obtained for
pair has been considered in relation with all the computation of FOLLOW.
the rules in the grammar, the e ect of that
pair has been completely determined. That is,
after a pair is added to F irst it need only be 6 Related Research
considered up to and including the rule from
which it was derived, after which time it may The extension to the LR algorithm presented
be excluded from further searches. For exam- by Nakazawa (1991) uses a similar approach
ple, take the previous grammar, and in partic- to that described here; the functions involved
ular the value of F irst after the rst iteration however are those necessary for the construc-
through the algorithm. The pair (N P; Det), tion of an LR parsing table (i.e. the GOTO
added because of the rule NP[agr: X, slash: and ACTION functions). One technical dif-

5
ference between the two approaches is that he Acknowledgements
uses positive restrictors (Shieber 1985) instead
of negative ones. In addition, both of his algo- This work was funded by the UK SERC. I
rithms also di er in another way from the al- am very grateful to Ted Briscoe, John Carroll,
gorithm described here. The di erence is that Mark-Jan Nederhof, Ann Copestake and two
they add items to a set using simple set addi- anonymous reviewers. All remaining errors are
tion whereas in the algorithm of Section 4.1 we mine.
add elements using the operator + . Further-
more, when computing the closure of a set of References
items, both of the algorithms there ignore the
e ect that uni cation has on the categories in Aho, A. V., Sethi, R., and Ullman, J. D. (1986).
the rules. Compilers - Principles , Techniques, and
For example, the states of an LR parser are Tools. Addison Wesley, Reading, MA.
computed using the closure operation on a set Brew, C. (1992). Letting the cat out of the bag:
I of dotted rules or items. In Nakazawa's al- Generation for Shake-and-Bake MT. In Pro-
gorithms computation of this closure proceeds ceedings of COLING '92, pp. 610{16, Nantes,
as follows: if dotted rule < A ! w:Bx > is France.
in I , then add a dotted rule < C ! :y > to
the closure of I , where C and B unify. This Briscoe, E., Copestake, A., and de Paiva, V.,
ignores the fact that both dotted rules may be eds. (1993). Inheritance, Defaults and the
Lexicon. Cambridge University Press, Cam-
modi ed after uni cation, and therefore, his bridge, UK.
algorithm leads to less restricted I values than
those implicit in the grammar. To adapt our Briscoe, E. J. and Carroll, J. (1993). Gener-
algorithm to the computation of the closure alised Probabilistic LR Parsing of Natural
of I for a feature-theoretic grammar would in- Language (Corpora) with Uni cation-Based
volve using a set of pairs of dotted rules as the Grammars. Computational Linguistics, 19(1)
value of I . pp. 25{60.
Harrison, S. P. and Ellison, T. M. (1992). Restric-
tion and termination in parsing with feature-
7 Conclusion theoretic grammars. Computational Linguis-
tics, 18(4) pp. 519{30.
We have extended an algorithm that manip- Nakazawa, T. (1991). An Extended LR Pars-
ulates CF grammars to allow it to handle ing Algorithm for Grammars using Feature-
feature-theoretic ones. It was shown how most Based Syntactic Categories. In Proceedings
of the information contained in the grammar Fifth European Conference of the ACL, pp.
rules may be preserved by using a set of pairs 69{74, Berlin, Germany.
as the value of a function and by using the Nederhof, M. (1993). Generalized Left-Corner
notion of subsumption to update this set. Al- Parsing. In Proceedings of the Sixth European
though the algorithm has in fact been used to Conference of the ACL, pp. 305{314, Utrecht,
adapt the constraint propagation algorithm of The Netherlands.
Brew (1992) to phrase structure grammars, the Pollard, C. and Sag, I. (1987). Information Based
basic idea should be applicable to the rest of Syntax and Semantics: Vol. 1. Lecture Notes.
the functions needed for constructing LR ta- CSLI, Stanford, CA.
bles. However, such adaptations are left as a
topic for future research. Shieber, S. M. (1985). Using restriction to ex-
Finally, improvements in speed obtained tend parsing algorithms for complex-feature-
with the active pairs mechanism of Section 5 based formalisms. In Proceedings of the 23rd
Annual Conference of the ACL, pp. 145{52,
are of an order of magnitude in an implemen- Chicago, IL.
tation using Common Lisp.

You might also like