Professional Documents
Culture Documents
multiple meanings of a single sentence that can cause given string can be derived from the grammar
serious problems and results unclear non-unique meaning unambiguously. If at any sentential form a situation
of a sentence. arises where we have two options, option to shift as well
as option to reduce, then this is called shift reduce
A grammar G has a 4-tuple, (NT, T, P, S), where NT is conflict [3]. It may also be the case where parser have
the set of non-terminals (also called variables), also two options to reduce the sequence. That is two
called non terminals, T is a set of symbols that will be productions are present in the grammar by which we can
contained by sentence(strings) of a language, also called reduce the given sentential form. This is called reduce-
terminals, S is known to be start symbol of the grammar, reduce conflict [3]. If any of the conflicts found during
all the derivations starts with S, and P is the finite set of the parsing of string, it is stated that grammar is
productions or rules that represent the basic construction ambiguous. The following example illustrates these
rules to define the language [1]. We require that in conflicts.
grammar one production which has start symbol S on the Example: 1 Given the following
left hand side be present always [2]. grammar and the string x = cac,
A production is a construction rule which has a variable S → SaS | SbS
(not-terminal) in one side which can be replaced by the A → SaS
combination of variables and terminals which is present on S→c
the other side of production ( A → where A belongs to For string “cac” we have two choices to reduce this with S
set of not-terminals and belongs to any combination as well as with A. So this results in a reduce reduce
of variables and terminals. At any point of time left side conflict. While for the string x = cacac, we have again two
can be replaced by right[1]. The symbol “→” is used to choice while parsing second ‘a’, we have a shift choice as
write in between left-hand-side and right-hand-side. We well as a reduce option resulting a shift reduce conflict.
start derivation of any string from the start symbol Chomsky Normal Form an specific CFG:
always [1],[2]. All intermediate stages of the strings
which come in the derivation process starting from start A grammar G = (NT, T, P, S) is in Chomsky normal form
symbol(S) are called sentential forms [2]. To verify if its productions are restricted to the forms:
whether a string can be derived from grammar or not, we X YZ where X, Y and Z are non-terminals in
build a parse tree also called derivation tree.
NT and exactly two non-terminals are on the
right (i.e. X,Y,Z
X a where A is a variable in NT and a is
exactly one terminal symbol in T (i.e.
Xand aT
Given a string, the detection can be done by Cocke-
Younger- Kasami (CYK) algorithm that whether a string
can be derived from a given grammar or not. This can be
done in polynomial time. That means an algorithm with a
Figure:2 Representation of parse tree for an English language polynomial time can decide that a string is in language or
not. We can easily prove that if a grammar is in Chomsky
The class of languages recognized by context-free Normal Form, a binary tree is constructed each time we
grammars is known as context free languages. The parse the given string [3].
Context free languages are rich enough to describe the
syntax of languages that we use to make programs. One A grammar in CNF makes parsing a string even easier just
of the sub-class of the context free grammar is the LALR because of its very simple structure. In an arbitrary CFG,
(1) grammars (Look Ahead Left to Right). In these type we don’t have any upper bound on the length of string
of grammars we try to reduce the given string by looking that is to be parsed (there could be many useless rules
a symbol which will come next. Instead of thinking and that later expand into ), so it is not possible to test
predicting the entire sequence, LALR parser just looks membership by generating all possible derivations. In
into next single symbol. By looking the symbol it can contrast, in grammars in CNF, there is a upper limit on
perform two kinds of actions – namely shift or reduce. If the length of any derivation of a a string of length n, it
the symbol under consideration along with previous is exactly 2n-1.
sequence of symbols (non reducible to any of the left The normal forms (Chomsky, Greibach etc.) were
hand side of the given production rules) does not match invented to solve the elementary problems involving
with any of right hand side of any production, shift CFLs, such as deciding membership and testing
operation takes place. Starting from given string for emptiness, more easily. The Chomsky normal form of a
which this parsing is done, if this parser is able to reach grammar yields efficient algorithms.
the starting point that is start symbol of grammar then
generate the unique string, i.e. o nly single of string. Parsing is a process to find a tree structure also
derivation for that string. So by this process called parse tree. If we are able to reach at the starting
repetition of same derivation is stopped and symbol starting from the given string. The basis for the
some patterns expanded only once [13]. decision is the symbol which is treated as look ahead. If
we are k symbols as look ahead then it is known as
To study this lets take an example of a grammar. We LR(k) at every step k symbols are taken into
take a grammar which is infinite and unambiguous. This consideration for reducing these k symbols by a variable.
can generate endless number of derivations recursively. If a variable has these k symbols in the rhs of its
S V1 | V2 , production, these symbols can be reduces to that variable
at this step [17]. This process is done using parse table.
V1 abV1 | ab , Following actions are possible; first- next symbol is
shifted, second- reduction with a variable which contains
B cdV2 | cd
symbol in its rhs, third- symbols reduced with start
The expansion at its first stage will give strings “V1” and symbol (showing acceptance of string) or at last- an error
“V2”. Expanding further strings “abV1”, “ab”, “cdV2” is reported. The grammar called LR(k) grammar if we
and “cd” will be generated. After this the expansion is deterministically parse with this algorithm. The table
terminated. If we look into strings “abV1” and “cdV2”. constructed by this method has entries for k symbols
They have altogether dissimilar prefixes, and remaining with no conflicts. The conflict is known to be a state
parts “V1” and “V2” have already been expanded where more than one actions are possible at a particular
during the second expansion. So algorithm now can be state. There is a nondeterministic situation for k look-
terminated and un-ambiguity can be reported ahead symbols indicates either shift-reduce or reduce-
successfully. reduce conflicts [17].
If we are getting a parse table which has not any non-
AMBER deterministic situation of above type then for every
string in grammar, will have unique parse tree. This
AMBER is developed by Schroer [14]. To generate results in a unambiguous grammar. But the grammar of
strings AMBER uses Earley parser [15]. Getting such type are the subset of context free grammars. It
strings, all are verified for duplicity. This works just does not include entire class of context free grammars
like Gorn’s method with some minor variations. The [20]. Grammar which do not belong to LR(k), we will
paths which results in all derivations are need to be not be able to say anything about that grammar using this
traced. Basically this is the same as Gorn’s method method. Only if the grammar belongs to LR(k)
[12]. This method can take some parameters with some verification can be done by creating its derivation table
deviations as in previous methods, to find the for k symbols. A parse table without conflict concludes
ambiguity. Not only all the strings are compared, but that grammar is LR(k) and is unambiguous.
also an ellipses alternative is there to match all
intermediate forms present too. This way in less number Brabrand, Giegerich and Møller Method
of steps, we conclude about ambiguity. The search can
be bounded at a certain stage, by applying the condition The method of Brabrand et al. [18] uses horizontal and
of maximum length or any upper limit can be put on the vertical ambiguity as basis for to detect ambiguity for a
number of strings [15] it generated. If we combine this grammar. Every production rule is checked for
parameter with another option (limiting the expansion) horizontal ambiguity. And every combination of
to apply each step, certain derivations can be stopped to productions is tested for vertical ambiguity, for exactly
parse. similar non terminals. By this ambiguity (vertical-
horizontal) is explained in form of language. In the
Jampana Method previous methods it was explained using grammar itself.
Vertical ambiguity is verified by finding the common
Jampana’s ambiguity detection method works in strings generated by two producton rules. Say first
grammar when it is in Chomsky Normal Form (CNF). production generates set L1 and second production
It is assumed that if no duplicate production (live generates set say L2. We find intersection of L1 and L2.
production) is present in derivation then only ambiguity If this is not empty, we say vertical ambiguity is verified.
can be present in that string otherwise not. This method If in a production rhs, can be broke into parts whose
follows the same principle. Rest method is same as languages are overlapped, horizontal ambiguity is
searching the duplicate strings as in Gorn’s method. verified in that rule. So in this method languages
The only requirement is that the grammar must be in generated by productions are taken into consideration not
CNF [16]. actual production rules. The approximation is done on
the languages generated by the production rules, making
LR(k) test the operations of intersection and overlap problems
This is a parser which uses a parse table to do the parsing decidable. For various approximation methods this
2. Wnew Wold
Proposed Technique
Following steps are used for ambiguity detection in the 3. do
grammar.
4. WoldWnew
Convert given Grammar into the Chomsky
normal form (CNF). 5. For A W do
6. For ( (A w) P ) ) do
Computation of first and last function for
production P.
7. If w (T Wold)* then
Identify the productions which have a
possibility of Vertical and Horizontal 8. Wnew Wnew{A}
ambiguity.
9. While (Wold Wnew )
Check the Vertical and Horizontal
Ambiguity for the productions. Return Wnew
1. Conversion of Grammar into Chomsky Normal The output grammar is free from the production which
Form cannot generate strings containing all terminals.
Following steps are used for converting CFG into CNF.
Removal of Null Production
Removal of Useless Symbol
Given a grammar G= (NT, T, P, S) then the production
Useless symbol are the variables which do not appear in A ,(where A is in NT) generates null, and is called
any string generated from the starting non terminal. null-production. To remove these types of productions
from grammar which generate the null string, apply the
following algorithms.
Algorithm 1: Identify the useless non-terminal in
grammar G Algorithm 3: CompNullable(G)
Input: A grammar G(NT, T, P, S)
Input: A grammar G(NT,T,P,S) Output: Productions which generates null productions.
Output: Productions which are not reachable from start 1. Wnull
symbol S. 2. Do
1. Wold WoldWnull
3.
2. Wnew {S}
4. For A NT do
3. Repeat step 4 to 7 While Wold Wnew do
5. for ( (A w) P ) do
4. WoldWnew
6. if w or w (Wnull)*
5. for AWold do
7. Wnull Wnull {A}
6. for (Aw)P
8. while (Wnull Wold)
7. Add all Variables appearing in w to W new
Return Wnull
8. return Wnew
V1 V2 | S
V2 b
Algorithms 5: Removal of unit production
Step 3: Removal of Unit Production
Input: A grammar G
After removal of Unit production, grammar G is
Output: Production which does not contain the Unit
production S V1S V1| aS | a | SV1 | V1S
Making production with two variables on the right side S V1V3 | V4V2 | a | SV1 | V1S
It could be possible that the grammar of the form V1 b | V1V3 | V4V2 | a | SV1 | V1S
X V1 V2 V3 ..........Vn V2 b
LHS (P): Returns the non terminal defined by participate in the computation of First(α).
production P
Computation of Last function
InternalFirst (V): Returns an iterator that visits
Last function are computing by reversing the
each production for non terminal V. VisitedFirst
production and calculating the first function
(X): Returns an iterator that visits each occurrence
which is the last function for the given
of X in the RHS of all rules.
productions
SymbolDerivesEmpty( ): Check the non
terminal symbol derive empty string or not.
Detection of Vertical Ambiguity Productions
Computation of First function
Given a grammar G‟ in CNF form. First of all
First Function- First function of a non- check all the productions which have the
terminal is the set of first symbols from the possibility of ambiguity. All productions have
right hand side of the production. First function been checked with function CheckProduction()
are used from scanning from left to right of the and then return the LHS value which consists of
production. non terminals.
Algorithm 6: ComputeFirst (α)
Algorithm 7: CheckVProduction()
Input- Grammar G’
Input: Grammar G’ containing production
Output- First functions corresponding to Grammar G.
P1, P2----------Pi
First (α): Set
Output: Production PPi containing Vertical
1. For each ANonterminal() do Visited First(A) Ambiguity Procedure
false
CheckVProduction()
2. First InternalFirst( ) 1. For each productions ()
Return (First) 2. Visited LHS(P) true
End 3. Count RHS(P)>1
Return LHS(P)
Function InternalFirst(Xβ):Set
// This method checks the vertical ambiguity in the
1. If Xβ = empty Input grammar Procedure CheckVambiguity()
then return ( ) 1. For each LHS(P) containing type production for
non terminal A P1|P2
2. If X =V
2. RHS(P)FirstLast(P1,P2)
3. then return ({X}) X is a nonterminal.
4. First 3. CALL [FirstLast(P1)]
4. CALL [FirstLast(P2)]
5. If not VisitedFirst(X) then
5. If First,Last(P1) First,Last(P2)
6. VisitedFirst(X)
LHS(P) contain vertical ambiguity
7. For each rhs ProductionsFor(X) do
8. First First InternalFirst(rhs) Detection of Horizontal Ambiguity Production
9. If SymbolDerivesEmpty(X) Given a grammar in CNF form. First of all
check all production which have the
10. then First First InternalFirst(β) possibility of ambiguity and then detect the
Return (ans) horizontal ambiguity in these production
First(α) is computed by invoking FIRST(α).
Algorithm 8: CheckHProduction()
Before any sets are computed, VisitedFirst(X)
for each nonterminal A. VisitedFirst(X) is to
Input: Grammar G’ containing productions
indicate that the productions of X already
P1,P2--------Pi 3.
Output: Productions PPi containing Horizontal 4. Visited LHS(P) true
Ambiguity Procedure
5. Corresponding RHS(P) true
CheckHProduction()
1. For each production () 6. RHS(P)[Nonterminal(P) 1]
2. Visited LHS(P) true
the right hand sides of productions from the
7. Production(Nonterminal(P)>1) variable for that line
The first variable on the first line is always the start
Retrun LHS(P) variable
// This method check the Horizontal ambiguity in the
input Grammar Procedure
Any character which is not a variable, is a terminal
1. For each production() containing type A P1P2 Grammar file for the language {anbn | n > 0} ∪
{bnan | n > 0} has the context free grammar
2. CallNonterminal()
S →X | Y
3. For each Nonterminal(P1,P2)
X →aXb | ab
4. Call First(P1,P2) Y →bYa | ba
Grammar G is stored in the txt file as follows:
5. Call last(P1,P2) S XY
6. FirstLast(P1)=First(P1) Last(P1) X aXb ab
Y bYa ba
7. FirstLast(P2)=First(P1) Last(P2)
The proposed technique for ambiguity detection in context
8. Production(P1)h= FirstLast(P1) free grammar implemented in Java. The output shown in
9. Production(P2)h=FirstLast(P2)
10. If [Production(P1)h Production(P2)h]
LHS(P) .contain Horizontal ambiguity
Implementation
This section is given for the implementation of the
proposed algorithms on various grammars. The proposed
algorithms have been applied on set of grammar in CNF
form, which can be easily converted into from given CFG.
For the test grammars are included of different sizes,
ambiguous grammars which contain horizontal, vertical
ambiguity or unambiguous grammar.
Grammars are stored as text files which should end with following snapshots.
the .txt extension. They are specified as follows: Figure 4 Demonstration of input grammar into CNF
Each line consists of space separated strings
which represent productions from a common Figure 4 shows the grammar information of the input
variable grammar file gg.txt and their CNF form. If the input grammar
already in CNF unchanged then grammar will be shown. The
The length first string of any line must have grammar information such as no of variables, start symbol,
no of production is also shown in the output.
one and is a type of variable.
( Vertical or Horizontal )
Figure 7 shown an another grammar is taken as input for
checking Horizontal ambiguity.
The algorithm proposed here is tested with some known
grammars i.e. ambiguity or non-ambiguity is known in
advance. The results after implementing the algorithm
gave correct results. Yet detecting the ambiguity is un-
decidable for any context-free grammar. That means that
even though the proposed algorithm halts and gives an
Figure 5 First function of the Input grammar output of whether a given arbitrary grammar is ambiguous
Figure 5 shows the First and Last symbol of the input
or not, the result is not provable in general.
grammar which is now in CNF, with the help of First and
Last symbol of the given grammar horizontal and vertical
Conclusion and Future Work
ambiguity will be calculated.
We have presented a technique for statically analyzing
ambiguity of context-free grammars, which is based on a
linguistic characterization. This thesis gives the ambiguity
detection technique with their background work. The
presented algorithms implemented in Java, identify the
horizontal and vertical ambiguity for the context free
grammar after converting into CNF because CNF form
easily implemented for the parsing and beneficial in terms
of computation. The presented algorithm is less complex
than others and applicable for simple grammar.