Professional Documents
Culture Documents
Remember that the main difference between a DFA and an NFA is that a DFA doesn’t have epsilon (ε)
transitions that represent "nothing" or "no input" between states.
As described in the section DFA versus NFA in the introduction of this series of posts, it may be shown that a
DFA is equivalent to an NFA, in that, for any given NFA, one may construct an equivalent DFA, and vice-versa:
this is the powerset construction or subset construction.
//
// Regular Expression Engine C# Sample Application
// 2006, by Leniel Braz de Oliveira Macaferi & Wellington Magalhães Leite.
//
// UBM's Computer Engineering - 7th term [http://www.ubm.br/]
//
// This program sample was developed and turned in as a term paper for Lab. of
// Compilers Construction. It was based on the source code provided by Eli Bendersky
// [http://eli.thegreenplace.net/] and is provided "as is" without warranty.
//
using System;
using SCG = System.Collections.Generic;
using C5;
namespace RegularExpressionEngine
{
/// <summary>
/// Implements a deterministic finite automata (DFA)
/// </summary>
class DFA
{
// Start state
public state start;
// Set of final states
public Set<state> final;
// Transition table
public SCG.SortedList<KeyValuePair<state, input>, state> transTable;
public DFA()
{
final = new Set<state>();
CharEnumerator i = @in.GetEnumerator();
while(i.MoveNext())
{
KeyValuePair<state, input> transition = new KeyValuePair<state, input>(curState,
i.Current);
if(!transTable.ContainsKey(transition))
return "Rejected";
curState = transTable[transition];
}
if(final.Contains(curState))
return "Accepted";
else
return "Rejected";
}
SCG.IEnumerator<state> iE = final.GetEnumerator();
while(iE.MoveNext())
Console.Write(iE.Current + " ");
Console.Write("\n\n");
/// <summary>
/// Implements a comparer that suits the transTable SordedList
/// </summary>
public class Comparer : SCG.IComparer<KeyValuePair<state, input>>
{
public int Compare(KeyValuePair<state, input> transition1, KeyValuePair<state, input>
transition2)
{
if(transition1.Key == transition2.Key)
return transition1.Value.CompareTo(transition2.Value);
else
return transition1.Key.CompareTo(transition2.Key);
}
}
As you see, a DFA has 3 variables: a start state, a set of final states and a transition table that maps transitions
between states.
Below I present the SubsetMachine class that is responsible for the hard work of extracting an equivalent DFA
from a given NFA:
//
// Regular Expression Engine C# Sample Application
// 2006, by Leniel Braz de Oliveira Macaferi & Wellington Magalhães Leite.
//
// UBM's Computer Engineering - 7th term [http://www.ubm.br/]
//
// This program sample was developed and turned in as a term paper for Lab. of
// Compilers Construction. It was based on the source code provided by Eli Bendersky
// [http://eli.thegreenplace.net/] and is provided "as is" without warranty.
//
using System;
using SCG = System.Collections.Generic;
using C5;
namespace RegularExpressionEngine
{
class SubsetMachine
{
private static int num = 0;
/// <summary>
/// Subset machine that employs the powerset construction or subset construction
algorithm.
/// It creates a DFA that recognizes the same language as the given NFA.
/// </summary>
public static DFA SubsetConstruct(NFA nfa)
{
DFA dfa = new DFA();
while(unmarkedStates.Count != 0)
{
// Takes out one unmarked state and posteriorly mark it.
Set<state> aState = unmarkedStates.Choose();
// If this state contains the NFA's final state, add it to the DFA's set of
// final states.
if(aState.Contains(nfa.final))
dfa.final.Add(dfaStateNum[aState]);
SCG.IEnumerator<input> iE = nfa.inputs.GetEnumerator();
dfa.transTable[transition] = dfaStateNum[next];
}
}
return dfa;
}
/// <summary>
/// Builds the Epsilon closure of states for the given NFA
/// </summary>
/// <param name="nfa"></param>
/// <param name="states"></param>
/// <returns></returns>
static Set<state> EpsilonClosure(NFA nfa, Set<state> states)
{
// Push all states onto a stack
SCG.Stack<state> uncheckedStack = new SCG.Stack<state>(states);
while(uncheckedStack.Count != 0)
{
// Pop state t, the top element, off the stack
state t = uncheckedStack.Pop();
int i = 0;
i = i + 1;
}
}
return epsilonClosure;
}
/// <summary>
/// Creates unique state numbers for DFA states
/// </summary>
/// <returns></returns>
private static state GenNewState()
{
return num++;
}
}
}
In the first post of this series we see the following line of code:
The SubsetConstruct method from the SubsetMachine class receives as input an NFA and returns a DFA.
Inside the SubsetConstruct method we firstly instantiate a new DFA object and then we create two variables
markedStates and unmarkedStates that are sets of NFA states which represent a DFA state.
From this we see that a DFA state can represent a set of NFA states. Take a look at the introductory post and see
Figure 2. It shows two DFA states that represent sets of NFA states, in this particular case the DFA final states
represent the NFA states {s2, s3} and {s5, s6}.
The HashDictionary helps us to give a name (to number) each DFA state.
For the sake of comparison I’ll show the NFA’s graph representation for the regex (l|e)*n?(i|e)el* that
we’re studying since the beginning of this series.
Figure 2 - NFA’s graph representation for the regex (l|e)*n?(i|e)el*
If you pay close attention you’ll see that the order the regex parser found the states is the order we visually debug
the code looking at the graph above.
With such states found we move next adding this DFA state into the variable unmarkedStates.
We then use a function called GetNewState that is responsible for generating a number that uniquely identifies
each state of the DFA:
When we pass to the next line of code we add to the dfaStateNum dictionary a key that is the set of states
returned by the EpsilonClosure function and a value that is the name of the initial state of the DFA.
dfaStateNum[first] = dfaInitial;
We make the initial state of the DFA be the dfaInitial value we just got.
dfa.start = dfaInitial;
Next we enter in the first while keyword. In this while we basically extract one of the unmarkedStates and
add the same to the markedStates set. This has the meaning of telling that we already checked such state.
// If this state contains the NFA's final state, add it to the DFA's set of final states.
if(aState.Contains(nfa.final))
dfa.final.Add(dfaStateNum[aState]);
Now it’s time to check against the NFA’s input symbols. To accomplish this we declare an enumerator of type
state that does the job of moving through each of the input symbols in the next while code block:
SCG.IEnumerator<input> iE = nfa.inputs.GetEnumerator();
As you see we call the function Move that is part of the NFA class. This function receives as parameters a set of
states and an input symbol to be checked against. It returns a set of states.
What the move function does is: foreach state in the set of states passed as the first parameter we check each
transition present in the NFA’s transition table from this state to another state with the input symbol passed as the
second parameter.
So, the first time we pass we get the following output from the Move function:
Figure 3 - Result from the NFA’s Move function the 1st time it’s called
If we look at Figure 2 we can assert that from the states present in the first state of the DFA (see Figure 1) we
can move to states {5, 16} with the first NFA input that is equal to ‘e’.
With the above result taken from the Move function we’re ready to go the EpsilonClosure function for the
second time to create the 2nd DFA state in the SubsetMachine class. This second time we get the following result
from EpsilonClosure function:
Figure 4 - Result from the EpsilonClosure function the 2nd time it’s called
Now, if you pay close attention, we can assert that starting at the states {5, 16} we can move with an eps-
transition to the states shown above. Remember that the states we pass to the EpsilonClosure function are
themselves included in the result returned by the function.
Now that we have created the 2nd DFA state we check to see if it wasn’t examined yet and if it holds true we add
it to the unmarkedStates variable and give a new name to this state numbering it with the GenNewState function.
// If we haven't examined this state before, add it to the unmarkedStates and make up a
new number for it.
if(!unmarkedStates.Contains(next) && !markedStates.Contains(next))
{
unmarkedStates.Add(next);
dfaStateNum.Add(next, GenNewState());
}
We create a new transition that has as key the number of the DFA state we’re checking and as the value the
current input symbol we’re after.
This has the following meaning: from state 0 with input ‘e’ go to state 1!
These are the subsequent values we get for the first unmarkedState we’re checking:
With input ‘i’ we can go to state { 14 } from which with an eps transition we can go to state { 17 }.
With input ‘l’ we can go to state { 3 } from which with an eps transition we can go to states { 4, 13, 8, 3, 12, 7, 2,
11, 6, 1, 15, 10 }.
With input ‘n’ we can go to state { 9 } from which with an eps transition we can go to states { 12, 9, 13, 15 }.
A point that deserves consideration is that each time you run the regex parser it’s not guaranteed that the numbers
that identify the DFA states will remain the same.
I won’t continue debugging because it would consume a lot of space in this blog post.
I think that with the above explanation it’s easy to get the point.
In short we’ll repeat the above steps for each unmarked state that hasn’t been checked yet working with it against
each input symbol.
For the regex (l|e)*n?(i|e)el* in one of the times I ran the code, I got the following DFA’s transition
table: