Professional Documents
Culture Documents
Reading: Hopcroft, Motwani and Ullman, Introduction to Automata Theory, Languages and Computation Section 7.4, pp. 298-302
Parsing Algorithms
CFGs are basis for describing (syntactic) structure of NL sentences Thus - Parsing Algorithms are core of NL analysis systems Recognition vs. Parsing: Recognition - deciding the membership in the language: For a given grammar , an algorithm that given an input is ?
decides:
Parsing - Recognition
Is parsing more difcult than recognition? (time complexity) Ambiguity - a parse for or all parses for ?
Identifying the correct parse Ambiguity representation - an input may have exponentially many parses
1 11-711 Algorithms for NLP
Parsing Algorithms
Parsing General CFLs vs. Limited Forms Efciency: Deterministic (LR) languages can be parsed in linear time A number of parsing algorithms for general CFLs require time
3
Asymptotically best parsing algorithm for general CFLs requires 2 376 , but is not practical
Utility - why parse general grammars and not just CNF? Grammar intended to reect actual structure of language Conversion to CNF completely destroys the parse structure
1,
the substring of
of length ,
, for 1
if
We then continue with substrings of length 2,3,... For a substring into two parts if: is a rule in the grammar
" #
1. 2. 3.
"
Finally, since
#
!
! ! !
in a table
Note that we only need to ll in entries up to the diagonal - the longest substring starting at is of length 1
% &
$
such that
Example
Consider the following CNF Grammar:
" "
' #
"
Example
1 1
2 3 4 5
Step (7) is the most nested and thus gets executed Thus, the entire algorithm requires
3
+ , -
3
+ , -
times
time
10
11
Example
1 1
2 3 4 5
12
are stored
When creating back-pointers, create a single back-pointer to the packed representation Allows to efciently represent a very large number of ambiguities (even exponentially many) Unpacking - producing one or more of the packed parse trees by following the back-pointers.
13 11-711 Algorithms for NLP