An alphabet is simply a finite set of symbols used in, in the various computer applications. A string is essentially the usual data type that you've seen in languages like java. Sigma star is the set of all possible strings over a particular finite alphabet sigma.
An alphabet is simply a finite set of symbols used in, in the various computer applications. A string is essentially the usual data type that you've seen in languages like java. Sigma star is the set of all possible strings over a particular finite alphabet sigma.
An alphabet is simply a finite set of symbols used in, in the various computer applications. A string is essentially the usual data type that you've seen in languages like java. Sigma star is the set of all possible strings over a particular finite alphabet sigma.
like ascii, or Unicode are sets of symbols used in, in the various computer applications in most of our examples we will use simpler alphabets which will include... Some of the alphabet, consisting of the number zero, one, one called the binary alphabet. Sometimes I'll use a small set of letters like A, B, C, or for the, for the case of our, our tennis example, we had the set of S and O as our input alphabet. Another interesting example might be the set of signals used by a protocol, such as transmit end and so on. Okay, a string, first of all, every string is over some particular alphabet which we'll call a capital sigma it's, it's a list of elements each of which is a member sigma. And we'll show strings with no commas or quotes. So for example, ABC is a string, over the, it could be alphabetic C or it could be ascii alphabet 011 01 is probably a is a string probably over the binary alphabet of zero and one this could be over any other alphabet that contains the ON1, strings are essentially the usual data type that you've seen in languages like C or java, Oral, we are going to write them without the special quotes It's also legitimate to see strings as lists of symbols as long as they're chosen from some particular alphabet sigma, and of course we never use commas or any other separator between the, symbols of the string. We'll use the notation sigma star to mean the set of all possible strings over, a particular finite alphabet sigma. The length of the string is the number of positions and it should be a common idea from a language like java. And we will use epsilon to stand for the empty string or string of length zero. Something which you meant find by quote, quote, in, one or another programming language. Okay, so here is an example, first of all, zero one star is the set of all strings over the binary alphabet of zero and one. Here is a list of some of the, strings, We'll do it shortest length first so the empty string is of course of length zero and then zero is the string consisting of the single character zero, one, again a string of length one, with a character one, and then you have things like zero, zero and so on the four strings of length two, and so on. There are infinite number of strings of course in zero, one, star. [sound] eh, you might notice that when I write something like zero, that could either be the character zero, the symbol zero, or it could be a string of length one that has one position and in that position is a zero, it these are not the same thing, they are different types of objects, no question about that, but, you can generally tell whether I am talking about strings or, or symbols for example only strings can be members of languages, symbols can't be members of a language although you could have obviously a string of length one which looks like a symbol [sound] Now language is a sets of strings and they can be finite or infinite sets of strings. The only limitation that we place on a language is that there is some finite alphabet from which all the strings of the language are composed. The Language L given as an example has 01 as its alphabet. It has all strings that do not have two consecutive 1s okay, so the empty string is there and both strings of length one, that is zero and one, of the strings of length two all are there except one, one, right, because one, one does obviously have two consecutive ones. Five of the eight possible strings of length three are there. The other three possible strings do have consecutive ones. And I haven't shown it all here, but there are eight of the sixteen strings of length four, are present. Now, just to give you, throw out a little bit of, of a question, you see that there is one of length zero, two of length one, three of length two, five of length three Eight of length four, you might think about how many would there be of length five. There' s an obvious pattern, it would be interesting for you to try to figure that out. Okay. Now, deterministic finite automata okay it's a formalism for defining languages and that formalism consists of a finite set of states okay, and with any use cap Q typically as the set of states of a [inaudible] if I don't tell you anything else, give it ano-. Give that set of states another name, but then think of it as Q. There will be an input, input alphabet which we will typically call capital Sigma. There will be a transition function typically again denoted delta. There will be a start state, and the start state's always in obviously it's one of the states, so it's in the set Q, typically write it as Q0. As you see there. And there will be a set of final states which we will typically call F and the set of final states is always a sub set of obviously the set of all states. The thing that makes the automaton work is the transition function. This function usually denoted by delta takes two arguments. A state Q and an input symbol A. It gives you back the state that the automaton goes to when it is in state Q, and the next input symbol to arrive is A. The function delta is total, that is it has a value for every state and symbol. There are examples of automata where we really don't want to continue in certain situations. For example our example our tennis automaton did not have transitions out of the two states where one player or the other has won the game. To fix up such situations, we have to introduce a dead state. A dead state is a state that is not accepting and that has a transition to itself on every input symbol. Once you get to a dead state you cannot leave and you can never accept, so dead is a pretty good description of what is going on. We're going to use the, abbreviation DFA for deterministic finite automaton. The deterministic means that there is a unique transition for every state and input symbol. We're going to meet non-deterministic automata soon and there, it is possible to transition to many states from one state on one input. Here is a tennis example. Notice that the two final states have two transitions out. So we add a dead state And all the missing transitions go to that state. The transitions from the dead state are to itself on all possible inputs. In this case that would be just S and O. Notice this is the probably the first example where we have more than one symbol, label, labeling in R. That's just fine. Now, we're going to represent a general DFAs by graphs. Pretty much as was done in the tennis example. Nodes of the graph, correspond to states of the DFA. An arc represents a transition func- the transition function. An arc from say state P to state Q. Labeled will be labeled by all those symbols say A and B. That have transitions from P to Q. There will be an arrow labeled start to the start state so we do something like this perhaps and we'll indicate final states by double circles so if Q is final, we would put a double circle around there. This is an interesting circle of an automaton that processes text. The goal is to recognize the string read so far in [inaudible] in I, N, G. Okay, the start state which we see here... Represents a condition where we have made no progress towards seeing I, N, G If in that state we see an I, that we have made some progress. So we go to a state that says I was the last symbol C. That's the state that we call saw I. Otherwise, we're going to stay in the start states so there's a transition on every symbol, every letter but I. To the start state that says that we're really, we made no progress. Now, from the saw I state, If we next see an N then we've made more progress and we go to the state, that, as we've seen The sequence IN. On the other hand, if from saw I we see another I, then we've not made progress but neither have we lost ground. We may be reading a word like skiing with a double I and thus the transition from state saw I on I is to itself, that's right here. On any symbol other than I or N we go from the saw I state back to the start state where we're seeing nothing. In state saw N. If we next see G then we win. We have just seen ING and we go to the accepting state which is saw ING. On the other hand, if we're in state saw N and we see an I then the pattern IN is broken but there's a new pattern beginning I that has started so we're gonna go back to the saw I state. That's this transition. On any input other than I or G including an N from the saw in state we have no progress at all so we go back to the start state. Finally, from the state saw ING we can only go back to saw I if we see another I. That's this. Or if we see anything but an I, we have to go back to the start state where we've seen Here's another example. It's an automaton that represents the simplest possible protocol for sending data. The program is in one of two states; ready and sending. It starts in the ready state. And eventually it gets a signal that some data has been loaded into its buffer at that point it enters the sending state where it does whatever is necessary to transmit the contents of the buffer the receiver will send an act symbol when the content is received in which case we can return to the ready step However if the receiver is down, we may instead get a local timeout signal that warns something is wrong and the buffer must be retransmitted. This automaton does not complete. There are no final states because the automaton is designed to run forever without rendering a decision. Also there are missing transitions. Now it is okay to ignore the act or timeout signals when you are in the ready state, staying therefore in the ready state so we might draw for example In R from ready to ready and that could include signals, time out, and ack. Okay. That's probably okay. However, a data in single in the sending state is an indication of an error so we might want to go to an error state. So you are going to have an error state. And if you are in the sending state and you receive a data in signal, you go to that error state. Okay, now, each, state, of course error er, has a transition out on each of the three signals, static and act and time out. A tech, a technical question that's asked about protocol is whether you can get to an error state. We could for example make the error state the final state so I'll garble carefully here and oh, by the way, it's really a dead state so I probably oughta add transitions on any of the three sym-, input symbols. And one of the interesting things about [inaudible] is that it's possible to answer questions like can this [inaudible] ever get into the error state. Okay, that's a question of course you could not ask about programs in general. For a running example, we're going to use this automaton. It's language is the set of binary strings that do not contain two consecutive zeros. State A, where the automaton will be whenever the input string seems so far is good. That is it contains no consecutive 1s. Also it does not end in a one. Surely the state should be a start state since when no input has been received, there are two no two consecutive 1s and moreover the input does not end in a one. We get to state B when the input is good, that is no two consecutive ones, but the last symbol seen is a one. Notice that the only way to get to B is to be in A and then get an input one. C is actually a dead state. We're there whenever two consecutive ones have been received. We arrive at C for the first time from B, which you recall means that the previous input was one, and in state B we receive a second one. Once in C we stay there because once a string has one, one you can never undo that fact, no matter how many zeroes you see. Okay. Here is, the other representation that we're going to use for finite automata. A transition table, okay. The ... This, this transition table that we are showing here represents the same automaton as the little graph that I ... I show in the, in the corner over here. Okay. Now in the table the rows correspond to the states. So you can see there, there is a row for A, B, and C. The columns correspond to the input symbols. We have columns for zero and for one. A final state will represent by starring it in its column in its row sorry and we will put an arrow next to the star state now the entries to the table are the values of delta, so for example, this entry is in the row for B and the column for one. So it represents delta of B and one, that is the transition that you make from state B when you get input one. And you can see on the, well, here at the automaton, if you're in state B and you see a one, you go to state C. So that's why the entry there is a C. Okay, now we're gonna use a convention which is actually very important because it reminds you of the types of things that might otherwise be confusing. That is, we will use lower case letters near the end of the alphabet; w, x, y, z and maybe u occasionally to represent strings of input symbols. On the other hand, letters at the beginning of the alphabet, typically A, B or C will represent single input symbols, remember these are analogous, to, characters in a language like java or, or C. Now, The delta function was defined you give it, you give it a, a state, and you give it a single input symbol. We want to extend it so that you can give the function a, an argument that's a state and another argument that's any string of symbols including the empty string. Okay, and we want the extended delta applied to state Q and any string W to tell us where the automaton get to, if follows the path of the transition diagram from Q, where the arcs are labeled by each of the symbols of W in order That is we look for the unique path who's labels form W. In the text we put a hat over the delta to remind us that it is the extended version. However, as we are going to see very shortly the extended delta agrees with the given delta when the string W is of length one. That is, when it's a single symbol. Thus there's not really a need to distinguish the extended and the original deltas and we're not going to do that here. Now the, extended delta is defined inductively, that is it's an induction on the length of the string. The basis says that delta of Q and the empty string is just Q. That is, if you're in state Q and nothing arrives in the input then you're gonna stay in state Q. That's the nature of how an automaton works. For the induction, suppose the input string is WA. Okay, now remember our convention, whenever you see something like WA you know that W is a string of some length, could be empty even but it is a string. And A being at the beginning of the alphabet is a single symbol. Okay, this is sort of the mathematical analog of type definitions of program variables. Now, the inductive rule says that, we first see where you get, where you get from state Q on string of inputs W. That is, we're going to recursively use the extended delta to figure out what delta of Q and W is. And then that is going to be some state. I dunno what it is, P. Then we apply the original delta function to delta of that state. And put, the last input symbol A. Okay so here is an example I am using the, transition, table, representation of the, same automaton we have been playing with. Okay. I want to figure out what the extended delta is from state B and 011. Now, 011 is broken up into a string that is 01 and a final symbol one. Okay. So what that tells us to do is figure out what delta of being 01 is that that's the string everything but the last symbol. And then apply the original delta function to that string and the last symbol one Now, what's delta of B and zero one. Well, it's a string and another symbol. So I can break that up using the recursive rule to say that it's the delta, the standard delta function applied to delta of D and zero and the final symbol one. And then, okay. Now I know what delta B and zero is, I can look it up. Oh, here, B and zero is A. So I can replace this by an A. That's what I have here. Okay. Now, I know what delta of A and one is. I'll look it up A, one that's B. Okay. So I replace that by B. And now what's delta of B and one? It's C. That's, that's that. As I mentioned, the extended delta wears a hat in the text book. However we really don't need to distinguish the two deltas because they agree on the single symbols. That is, if we want the extended delta for a state Q and a string consisting of one symbol A, formally we treat that string as the empty string followed by the symbol A. Then we have to compute delta hat of Q and epsilon. That's, what we have done here. But we know by the basis rule that that's Q itself and therefore, the extended delta of Q and A is just the original delta applied to Q and A. In fact I really cheated on the previous slide. I needed delta hat of B and zero and I just went to the table and looked it up as if it was delta of B and zero. Now we see that it is so there was no harm, no foul. There are many different kinds of automata. We've seen only the deterministic [inaudible] of automata so far. But there are many others. But no matter what kind of automaton, its job is to define some language. We'll use the notation L of A to refer to the language defined by the automaton A. If A is a deterministic finite automaton, then the language it defines is the set of strings to take the start state to a final state. That is, your zero is as high as the star [inaudible] and set of strings That get you to a final state. Okay, formally that was defined that the language of A is the set of strings W such that delta of Q zero and W is in F. Okay, so, here's an example, string 101 is in the language of running DFA. We start in the start state right here. We follow the first one. That, that's a string. From B we follow the zero that gets back to A. From A we follow the third one, the third addition, which is a one getting us to B, B is an accepting state so 101 is in the language. That makes sense given that the lang-, the automaton is supposed to accept the language of all strings that don't have two consecutive 1s and obviously 101 does not. Okay, the expression, this expression It is called the set former. It starts with a curly brace and then an expression representing the things that we want to put in the set. So, in this case, the expression simply says that the set consists of some string's W. We know again that they're strings because of our convention. That, that the letters at the end of the alphabet represent strings. The vertical bar can be read such that And then there will follow a description of what must be true for something to be a member of the set. In this case, it's saying that W is, binary string that it's in, it's in the strings over alphabet zero one. And W does not have two consecutive ones. That that's what I've been telling you all along the language of this automaton was. We're going to prove that the DFA that we've been playing with accepts the language I claimed it does: a set of binary strings without consecutive ones. I'm going to spend a good deal of time proving this simple result because it will give you all the details about how one proves something about languages. In the future I'm not going to be so focused on proofs, but I think it's important that everyone go through one of these proofs in all of its gory detail. To prove sets equal we generally mean to just prove two things, that each is contained in the other. That is we stop by assuming W is a member of one so as and we use that fact to prove that W is in the other too. Then we start all over and we assume that W is in two. We use that to prove W is also in S In what follows, we take S to be the DFA that we've been playing with and T to be the set of binary strings without consecutive 1s. First, we're going to prove that if W is accepted by the automaton, then W has no one ones and the proof is an induction on the length of W. It turns out that if we simply try to prove this statement we fail. And I'll point out in a few slides what goes wrong. A common trick for inductive proofs is to make a more detailed statement than you really want because it makes the inductive proof work. Here you need to distinguish whether an accepted string gets you to state A or B because we need to know whether or not the string ends in one even though the conclusion no one ones is true in both the states A and B. The inductive hypothesis will have two parts. Okay, part one says that if W gets you to state A then not only is W good in the sense that it doesn't have consecutive 1s but it doesn't even end in one Part two says that if W gets you to state B, then W is still good but it must end in one. Okay. The basis is when the length of W is zero, that is W is the empty string. By the way notice that the bars around the string represent the, that we want the length of that string. So bar W bar means length of W. Now let us prove part one of the basis. Delta of A and the empty string does equal A so if part is true but the conclusion is also true because the empty string obviously does not have executive 1s and does not end in one. For part two, the things are a little trickier. It is false that delta of A and the empty string equals B, but unfortunately it is also false that the empty string ends in a single one. Okay, however an important principle of logic is that the statement false implies false is true. That is, whenever the if portion of the statement is false it doesn't matter whether the then, portion is true or not. The statement as a whole is true. Thus a statement like, 'if I am superman, then I wear red undershorts' is a true statement simply because I am not superman. You don't have to concern yourself with the color of my undershorts. The mathematical term for an if then statement that is true because it is the [inaudible], [inaudible] is that the statement holds vacuously. To begin the proof of the inductus step we assume W is a string of length of at least one and we assume that the inductor has opposites that is statements one and two of that string that gets the automaton to stage A and B. It's true that [inaudible] shorter than W Let W equal XA by our convention A is the last symbol of W and X is all the symbols, possibly none, up to not including the last symbol of W. Since X is shorter than W, we assume the inductive hypothesis for X. We're going to improve both statements one and two for W. [sound] let's start with one. That is if delta of A and W is A then W is good and does not end in one. How do you get to A by reading string X followed by symbol A? Well look at the diagram. The only transitions into A Are on input zero, that is they're here and here. So A must be zero. That immediately lets us conclude the unit does not end in one. Furthermore these transitions to A are only from A and B. Thus X must get us to either A or B. In either case we can conclude using the inductive hypothesis that X is good it has no consecutive 1s. Thus W has no one no consecutive 1s, and surely does not end in one. Now for part two. If delta of A and W is B, then W is good and ends in one. Now there is only one way to get B. You have to be in state A and the input has to be in one. That is this transition right here. Thus if W is X followed by A. Then we know X gets us to state A and the, the input symbol little a is one. We can therefore conclude that W ends in one. Now we imply the inductive hypothesis, to X and we conclude that X not only has no consecutive 1's, but it doesn't end in one because it got us to state A. Thus the fact that W is X followed by A doesn't allow the possibility that W ends in eleven. And since any occurrence of one, one and W, would either have to be at the end or lie completely with an X we conclude that W does not have any consecutive ones. Notice that if we go and use this more complicated inductive hypothesis where we distinguish between states A and B according to whether the string ends in one, string X ends in one, then we can not make the inductive proof. If we know only that X gets us to A or B if it's good, then it might end in one and get us to state A in which case W equals XA would have two consecutive 1s and yet get us to B. That therefore we would not be able to push though inductive hypothesis that didn't distinguish between whether we get to state A or B. [sound] Well we're not done. We still have to prove that T is contained in S, that is if W is a good strength, no consecutive ones then it's accepted by the automaton. It is helpful to restate what we need to prove in it's contra-positive form which is logically equivalent to the original. The contra positive of an if then statement say if X then Y is if not Y then not X. We can see why this is an equivalent statement since if Y is false it couldn't be that X is true because whenever X is true Y is true. In this case X is the statement that W is a good string. That is it has no 11s. And, and Y is the statement that W is accepted by the automaton. The contra-positive is that if W is not accepted by the automaton then W is not a good strength that is it contains one, one as a sub string. Because there is an unique transition from every state and every input symbol, each W gets the, DFA to exactly one state. Thus the only way W is not accepted is if it gets you to C. Notice that the only way to get the automaton to see it, is for some string X to get it to B. That is, that is here. And then from input one to follow. Once in C, you stay in C so anything can follow the X and the one. We'll call that Y, that is any string W that gets the automaton to see must be of the form X one Y where X gets us to B. [sound] We already observed that the only way to get to B is by a string that ends in one. Since the only transition into B is on a one. That's this. Thus X must be of the form Z1 for some Z. Thus W can be written as Z11Y and we conclude that delta of A and W if W ... Sorry. We can conclude that if delta of A and W is C then W is bad. That is it definitely contains two consecutive ones. Now we introduce a class of languages called regular languages. These are the languages that have a DFA accepting them. That means that the languages exactly the set of strings accepted by this automaton. Soon we shall see that there are several other ways to describe the regular languages including the regular expressions and several forms of non deterministic automaton. [inaudible] While many common languages are regular there are also many that are not. Intuitively [inaudible] automaton cannot count beyond a fixed number thus they cannot do things like check whether it has seen the same number of zeros as ones on the input or check the parentheses are balanced and are arithmetic expression. For these tasks we need more powerful mechanisms such as context free grammars which we shall meet soon enough. Here is an example of a language that is simple to understand but is not a regular language. To understand what this notation is saying we first need to know that at exponent I and a symbol is shorthand for the string consisting of high copies of that symbol. Thus zero to the fourth is a shorthand for zero, zero, zero, zero. We read the [inaudible] from her that is as zero to the N one to the N such that N is greater than or equal to one or more [inaudible]. The set of strings consisting of N zeros followed by N ones such that n is at least one. The strings in the language L1 are thus 01 0011 three zeros followed by three ones and so on you get the pattern alright. [sound] Okay here's another example the set of W such that W is in the set of strings formed by the left and right parentheses and W is balanced. Okay I hope people are familiar with the idea of balanced parentheses, but intuitively they're just those sequences of parentheses that can appear in some arithmetic expression. For example this string could appear in an arithmetic expression like A plus B times C plus D, notice you've got left, right, left, right. Okay. An example of strings that are not balanced, well, right then left is, is obviously not balanced. You can't have, ever have in any prefix of the string more right parenthesis than you have left parentheses. Another example of a non-balanced string would be left, left, right, because there you have more, lefts than rights. However regular languages are common. For example in each language there is a format for floating point numbers and this format can be quite complicated with optional E's and or optional decimal points and strings of digits that could or could not be empty. But in all programming languages I know about the set of strings that represent some floating point number forms a regular language. Okay. This is a very interesting case illustrating what finite memory really means. We want to know if a binary number is divisible by twenty three. We're going to read the bits high order first, but we have only finite memory. How can we remember exactly what sequence of bits has been read since the sequence can grow very long and in particular longer than the number of states or longer than any limit that you might care to put on it. Okay. But there's a trick. We don't really need to remember everything about the bits read. It's sufficient to remember what the remainder is. When we divide the number by twenty three. Thus, we're gonna have twenty three states, corresponding with the twenty three possible remainders when an integer is divided by twenty three. These are of course zero through twenty two. The start state is zero because we interpret an empty string as representing zero. That may be a bit of an assumption, but nothing else makes sense and treating it as zero makes everything work out right. The state zero is also the only final state since we want the inputs that leave a, that leave a remainder of zero when divided by twenty three. We're going to assume that things are working right after reading a binary string W. That is W takes state zero to the state that is the correct remainder when W is divided by twenty three. I'm using the C style percent operator to denote remainder. You can also read the percent as modulo, which is just another way of saying the remainder when divided by. Okay? So, I percent twenty three is the remainder when I is divided by twenty three or I modulo twenty three. The transition from each state I on input zero is to the state that is the remainder of 2I divided by twenty three. To see why this works, we know that I is, can, let's see can be written as I equals twenty thr-, sorry, twenty three A plus B for some integer A and some integer B that's in the range zero to twenty two. That is B is the remainder or I modulo twenty three. Okay. Then two I is what? It's, well it's forty six A plus two B. Since forty six is divisible by twenty three we can sort of forget it, okay, so what we have, what we have left is that two I is two B. Okay. Now if we want the remainder, of two I when I is divided by twenty three, we can just take the remainder of two B divided by twenty three. The important point is that we never need to know what A is and we never need to know I exactly. We just need to know its remainder. Now, for the same reason, when a one arrives in the input we can go from state I to the state that is the remainder of two I plus one, when that's divided by twenty three. That's two I plus one mod-, modulo twenty three. For some examples, delta of fifteen and a zero, that's state fifteen and input zero, double the fifteen you get thirty and then you take the remainder of thirty, divided by twenty three and that's seven. On, in, but zero state fifteen goes to state seven. For another example, if you're in state eleven and a one comes in. Well, twice eleven is twenty two plus one is twenty three. So it's twenty three mod twenty three which is zero. So that says whenever you're in state eleven and a one comes in, you go to state zero. And what that means is that any string that leaves a remainder of eleven no matter how long that string is if it leaves a remainder of eleven you follow it by a one, it will be exactly divisible, divisible by 23. Interestingly you can also recognize all the binary strings divisible by 23 or any other particular number if we like. If we [inaudible] the strings in backwards that is [inaudible] first for example this string zero one, one, one, zero one zero, zero is in the, this language because if you reverse it you get this string: zero, zero one, zero one, one zero. And if you convert that to binary you get forty six and obviously forty six is divisible by twenty three. Now, I challenge you to construct the DFA for this language L four, but it exists. Okay. There's a theorem which we're soon going to see that says if a language is regular than its reversal, that is what you get by reversing each of its strings, is also a regular language. The proof of the theorem will let us construct the DFA for the reverse language from the DFA we just saw.