You are on page 1of 14

11-711 Algorithms for NLP

Introduction to Parsing Algorithms The CYK Parsing Algorithm

Reading: Hopcroft, Motwani and Ullman, Introduction to Automata Theory, Languages and Computation Section 7.4, pp. 298-302

Parsing Algorithms
CFGs are basis for describing (syntactic) structure of NL sentences Thus - Parsing Algorithms are core of NL analysis systems Recognition vs. Parsing: Recognition - deciding the membership in the language: For a given grammar , an algorithm that given an input is ?

decides:

Parsing - Recognition

Is parsing more difcult than recognition? (time complexity) Ambiguity - a parse for or all parses for ?

Identifying the correct parse Ambiguity representation - an input may have exponentially many parses
1 11-711 Algorithms for NLP

producing a parse tree for

Parsing Algorithms
Parsing General CFLs vs. Limited Forms Efciency: Deterministic (LR) languages can be parsed in linear time A number of parsing algorithms for general CFLs require time
3

Asymptotically best parsing algorithm for general CFLs requires 2 376 , but is not practical

Utility - why parse general grammars and not just CNF? Grammar intended to reect actual structure of language Conversion to CNF completely destroys the parse structure

11-711 Algorithms for NLP

The CYK Parsing Algorithm


One of the earliest recognition and parsing algorithms Assumes the grammar is in CNF (and depends on this!) Based on a dynamic programming approach: Build solutions compositionally from sub-solutions Store sub-solutions and re-use them whenever necessary Uses the grammar directly (no PDA is used) Recognition version: decide whether
  

11-711 Algorithms for NLP

The CYK Parsing Algorithm


Principles of the Algorithm: Input is
1 2
              

We denote starting with

1,

the substring of

of length ,


For every and for every variable algorithm determines if


      

  

 

   

in the grammar, the

11-711 Algorithms for NLP

The CYK Parsing Algorithm


The algorithm works on substrings of increasing length: We start with substrings of length 1:
1 1
    

, for 1

 

 

 

if

is a rule in the grammar

We then continue with substrings of length 2,3,... For a substring into two parts if: is a rule in the grammar
" # 

1. 2. 3.
  " 

Finally, since

    #  

 

, we consider all possible ways of breaking it and


     ! ! 

 !

!       ! !

, we need to verify that


5

11-711 Algorithms for NLP

The CYK Parsing Algorithm


We keep the results for every Each table entry
     $  

in a table

contains the set of variables




Note that we only need to ll in entries up to the diagonal - the longest substring starting at is of length 1
%    &

  

$  

such that

11-711 Algorithms for NLP

Example
Consider the following CNF Grammar:
 " " ' #

Let us run the CYK algorithm on

 ( ' ) " #  " # * ' ) #  * ( ( * ( ( ' )

"

11-711 Algorithms for NLP

Example

1 1

2 3 4 5

11-711 Algorithms for NLP

Time Complexity of CYK


We have three nested loops, each with ranges of at most 1 to The internal individual steps (2) (5) (7) each require constant time, since the grammar is xed, and thus the size of is a constant
$   

Step (7) is the most nested and thus gets executed Thus, the entire algorithm requires
3
+ , -

3
+ , -

times

time

10

11-711 Algorithms for NLP

Adding Parsing to CYK


We need to construct parse trees for strings in Idea: Keep back-pointers to the table entries that we combine At the end - reconstruct a parse from the back-pointers This allows us to nd all parse trees
. / , 0 -

11

11-711 Algorithms for NLP

Example

1 1

2 3 4 5

12

11-711 Algorithms for NLP

Parsing with CYK


Efcient Representation of Ambiguities Local Ambiguity Packing a Local Ambiguity - multiple ways to derive the same substring from a non-terminal All possible ways to derive each non-terminal together
 

are stored

When creating back-pointers, create a single back-pointer to the packed representation Allows to efciently represent a very large number of ambiguities (even exponentially many) Unpacking - producing one or more of the packed parse trees by following the back-pointers.
13 11-711 Algorithms for NLP

You might also like