Computers 101: Deterministic Finite State Automata

Topic
A discussion of deterministic finite automata and basic theory of computation.

Seminar
 15:01 &lt; TRWBW&gt; maybe give it a minute for stragglers, but the the seminar is              going to be an introduction to deterministic finite state automata, from scratch 15:02 &lt; TRWBW&gt; so i'm going to start with an informal description of              computation 15:03 &lt; TRWBW&gt; then modify it to get into a form that can be modelled with a               DFA (deterministic yada ...) 15:03 &lt; TRWBW&gt; then give a formal definition and show some examples 15:03 &lt; TRWBW&gt; um, anyone here who *hasn't* seen DFA's before? 15:04 &lt; Kasadkad&gt; I haven't seen enough to really know what they are 15:06 &lt; TRWBW&gt; okay, well i guess i'll start then. i'm starting very basic, i               can speed up slow down if asked. kinda the advantage of a live talk is that you can ask me to do stuff, so might as well use that 15:06 &lt; TRWBW&gt; so an informal description of a particular computation: adding up a list of numbers 15:06 &lt; TRWBW&gt; i'll describe it as, another person has a list of numbers 15:06 &lt; TRWBW&gt; you ask them for numbers one at a time, integers, they give you the next on the list 15:07 &lt; TRWBW&gt; you have a pad of scratch paper, so you start with a total of 0, and add each number as you get it to get a new total 15:07 &lt; TRWBW&gt; when you ask for the next number and the guy says there are no               more, you tell him the total 15:08 &lt; TRWBW&gt; the DFA model has some similarities to that, but some distinctions, even informally 15:08 &lt; TRWBW&gt; the similarities are that a DFA processes a sequence of inputs, one at a time, in steps, and at the end produces the output, and what it does on each step is completely determined mathematically 15:09 &lt; TRWBW&gt; the main difference is in the concept of state 15:09 &lt; TRWBW&gt; in that informal example, your state is your pad of paper. and as i described it, it could store anything you wanted. in               particular, there was no problem with the partial total getting too big to write down on your scratch pad 15:10 &lt; TRWBW&gt; its state in the sense that if the process was interrupted, you took a break or something 15:11 &lt; TRWBW&gt; you could come back and pick up where you left off. the sub total on your paper is all the information you need from the preceeding part of the computation to continue it to the output 15:11 &lt; TRWBW&gt; a DFA is by definition only allowed state that takes on one value from a finite set. that's the &quot;finite state&quot; in               deterministic finite state automata 15:12 &lt; TRWBW&gt; another distinction is that while adding up takes inputs from an               infinite set, the set of integers, DFA's take inputs from again a finite set (a different one) 15:13 &lt; TRWBW&gt; a third distinction is that while adding produces a number as an              output, DFA's are very limited in the output they produce. they produce a single &quot;yes&quot; or &quot;no&quot; value, or using the usual terminology, &quot;accept&quot; or &quot;reject&quot; 15:13 &lt; TRWBW&gt; so they are a very simplified model of computation, but useful because that makes them easier to define and prove theorems about. they also serve as a building block for more general models. 15:14 &lt; TRWBW&gt; to make my informal example more DFA like, i'll change it to               adding up one digit numbers, in fact more restricted, 0's and 1's, so that the input is now a sequence of values from the finite set {0,1} 15:15 &lt; TRWBW&gt; that set {0,1} is called the alphabet of the DFA. in fact all the DFA terminology has a bit of a bias towards text, but it's               just a finite set. 15:16 &lt; TRWBW&gt; to make the output &quot;accept&quot; or &quot;reject&quot; i'll change it to the DFA determining whether the list sums to a number that is odd. if it's odd, &quot;accept&quot;. if it's even, &quot;reject&quot;. i'll include the case of an empty list of numbers as &quot;even&quot; 15:17 &lt; TRWBW&gt; those changes allow the DFA to get by with only a finite number of values of state, or &quot;states&quot;. it only has to remember whether the previous subtotal was even or odd to get the final answer. 15:17 &lt; TRWBW&gt; so a DFA that does this can be represented graphically as:  15:18 &lt; TRWBW&gt; kilimanjaro, mopped: make sense so far? 15:18 &lt; mopped&gt; yes 15:20 &lt; TRWBW&gt; i'll introduce some terminology. the input to a DFA is a finite, ordered sequence of values from its alphabet set, here {0,1}. in               the context of DFA's they are called strings, and usually written without commas. so 010011 would be the sequence 0,1,0,0,1,1 15:20 &lt; mopped&gt; or rather, does the double circle around 'odd' mean anything significant? 15:20 &lt; TRWBW&gt; the symbol e will denote the &quot;empty string&quot;, a sequence of no               values. empty strings are perfectly valid, they show up all the time. 15:20 &lt; TRWBW&gt; k, yes 15:21 &lt; TRWBW&gt; those circles are the states of the DFA. two of them have been specially marked, one has an arrow from nowhere to it. that's               the &quot;start state&quot;. another has a double circle, that means its an &quot;accepting state&quot;. 15:22 &lt; TRWBW&gt; you always have one and exactly one start state. any state, including the start, can be marked as accepting or not. here i               just happened to have one, and it's different from the start, but that's accidental. you can have every state accepting, no               state accepting, or anything in between. 15:23 &lt; TRWBW&gt; a diagram like that fully defines the DFA. given one, you can determine whether the DFA accepts or doesn't accept a particular string by simulating it based on the diagram 15:24 &lt; TRWBW&gt; for example, for the string i gave 010011, you would start in               the state with the arrow from nowhere, the start state, and since the first symbol is 0, you could follow the arrow labeled 0 back to the start state &quot;even&quot; 15:24 &lt; TRWBW&gt; then next the 1 takes you from &quot;even&quot; to 15:24 &lt; TRWBW&gt; then next the 1 takes you from &quot;even&quot; to &quot;odd&quot; along the arrow labeled 1 15:25 &lt; TRWBW&gt; then 0 takes you from &quot;odd&quot; to itself, so you stay in odd, then again for the next 0, then a 1 takes you back to even, and the last 1 takes you to odd 15:25 &lt; TRWBW&gt; now odd has a double circle, which means its an accepting state, so the DFA accepts the string 010011. which it should, since 0+1+0+0+1+1=3 is odd 15:26 &lt; TRWBW&gt; i've given the requirement that it have a start state, and that every state is either marked as accepting or not. the last requirment is on the arrows 15:27 &lt; TRWBW&gt; every state must have an arrow leading from it for every symbol in your alphabet. as you can see in the diagram, those arrows are allowed to lead back to the state they come from, but to be               a DFA you are forbidden from either having a state which for some symbol doesn't have an arrow out (labelled with that               symbol), and forbidden from having a state where for some symbol it has more than one arrow out (labelled with that symbol) 15:28 &lt; TRWBW&gt; i'll give another example, before i move on to formal definitions, this one with an alphabet of {a,b} instead of {0,1}.  15:29 &lt; TRWBW&gt; mopped: give it a try, would that DFA accept the string abba ? 15:31 &lt; mopped&gt; i'm going with no, as it lands on qa 15:31 &lt; kilimanjaro&gt; (Gotcha so far, I was wrangling some tacos earlier.) 15:31 &lt; TRWBW&gt; yes 15:32 &lt; TRWBW&gt; okay, for any DFA, and any particular string, you can say unambiguously whether it accepts or doesn't accept that string. 15:33 &lt; TRWBW&gt; that lets you talk about the *set* of strings that the DFA accepts. again in keeping with the bias towards text terminology, that's called the &quot;language&quot; of the DFA 15:33 &lt; TRWBW&gt; perhaps a bad choice of terminology, but that's what everyone uses. but all it is mathematically is some set of strings over the alphabet 15:34 &lt; TRWBW&gt; here's another  15:35 &lt; TRWBW&gt; mopped: maybe take a look at that and see if you can figure out an informal description of what its language is 15:37 &lt; TRWBW&gt; if anyone new to DFA is               following, suggest you take a moment to try and figure out what the language of that DFA is, the last one 15:38 &lt; TRWBW&gt; you could start with some examples, does it accept the empty string? what about: a,b,ab,ba,aa,bb 15:38 &lt; mopped&gt; Uh, it begins/ends with the same symbol? 15:38 &lt; TRWBW&gt; yup 15:39 &lt; TRWBW&gt; i'll give a formal definition of the DFA to make it more mathy, but basically the diagrams define them already, so nothing new there 15:41 &lt; TRWBW&gt; so to summarize, i've defined a structure for which you can say whether it accepts or rejects a particular string. this means the structure defines a language (set of strings) that it               accepts. the rest of the talk will be about some theorems about what kinds of languages can be defined by a DFA (accepted by the               DFA), and what kind can't 15:42 &lt; TRWBW&gt; so the standard 5-tuple definition of a DFA, a DFA M=(Q,E,t,q0,A) where Q is a finite set (states), E is another finite set (alphabet), t is a transition function t:QxE-&gt;Q (the               arrows), q0 is a member of Q (the start state), and A is a                subset of Q (the accepting states, marked with double circles on                the diagrams) 15:44 &lt; TRWBW&gt; a string is a finite sequence of symbols from the alphabet E. E*               denotes the set of all possible strings over E, and L(M), a                subset of E*, is the language (again, just a set of strings) that M accepts 15:44 &lt; TRWBW&gt; mopped: does this kind of terminology make sense to you? 15:45 &lt; mopped&gt; not at all, but go on 15:46 &lt; mopped&gt; I understand what you're talking about, I'm just a bit wiery of                the M =  part 15:47 &lt; mopped&gt; wiery/wary :p 15:47 &lt; TRWBW&gt; mopped: it just says those are the parts of it 15:47 &lt; mopped&gt; ok 15:47 &lt; TRWBW&gt; mopped: if you have those 5 things, those define your DFA. 15:47 &lt; TRWBW&gt; mopped: since i think you already get the idea, the formalism isn't a big idea, but it's traditional, so i'll finish it for forms sake 15:48 &lt; mopped&gt; sure 15:48 &lt; TRWBW&gt; the function t is the key part of the DFA, it corresponds to the arrows in the diagrams. so t(q1,s)=q2 means that there is an               arrow labeled s from q1 to q2 15:49 &lt; TRWBW&gt; the reqirement i gave before, one and only one arrow for each state and symbol, corresponds to t being a valid function 15:49 &lt; TRWBW&gt; you can then extend t inductively from s being a single symbol, 0 or 1 or a or b or such, to w a string of symbols 15:50 &lt; TRWBW&gt; you start with e, the empty string, &quot;&quot;, and define t(q,e)=q. then for a string w=s1s2...sk, you define t(q,s1s2..sk)=t(t(q,s1s2...s(k-1)),sk) 15:51 &lt; TRWBW&gt; which is a bit of abuse of notation, but it just means that t(q,w) is the state you would end up if you started in state q,               and followed the arrows corresponding to the symbols of w 15:52 &lt; TRWBW&gt; so then L(M)={w in E*: t(q0,w) in A} becomes the formal definition of the language of M. it's the set of strings, from the universe of all possible strings over the alphabet, such that if you start in the start state, and follow the arrows, you end up in a state that's in the set of accepting states. 15:54 &lt; TRWBW&gt; mopped: so i guess defining DFA's is done, so after that comes theorems. there are a couple that would make sense at this point. one is proving that if you have machines M1 and M2, there exists a machine M3 such that L(M3)=L(M1) u L(M2), where that u               is set union. another is something called the pumping lemma, that lets you prove there are some languages for which no DFA exists. 15:55 &lt; TRWBW&gt; mopped: of could do something ambitious, like define regular expressions -- you might have seen if you've programmed in perl or such -- and prove that any DFA corresponds to a regular expression with the same language. 15:55 &lt; TRWBW&gt; mopped: any of those sound worthwhile? 15:55 &lt; mopped&gt; I've read about the latter (pumping lemmas) 15:55 &lt; subconscious&gt; ! 15:56 &lt; mopped&gt; Something about palindromes having no DFA? The proof the book provided was a bit over my head, but then agian I wasn't too sure on my finite automata then :P 15:56 &lt; TRWBW&gt; mopped: yes, could do over that again. 15:57 &lt; subconscious&gt; great I am on /ignore 15:57 &lt; TRWBW&gt; subconscious: did you say something i missed? 15:59 &lt; kilimanjaro&gt; Although we aren't really taking a poll, I'd like to hear more about a) argument that there are languages not                     accepted by dfa, and b) link between dfa and regexp. 16:02 &lt; TRWBW&gt; kilimanjaro: the previous is just pumping lemma, and then you can demonstrate plenty of languages, palindromes, a^n.b^n for n               natural, etc. 16:03 &lt; kilimanjaro&gt; lets go for it 16:03 &lt; TRWBW&gt; kilimanjaro: the latter there is a short proof using generalized non-deterministic finite state automata, but i prepared a proof non-deterministic finite state automata, but i prepared a proof that for any M, there is a regular expression for L(M) that doesn't require any of the later machinery 16:04 &lt; TRWBW&gt; it's not short in the sense there are a bunch of cases, but they are all similar, so after the first case the rest become easy to               figure out 16:04 &lt; TRWBW&gt; anyways, your call, you seem to be the audience ;) 16:04 &lt; mopped&gt; I don't mind either way, both seem interesting 16:05 &lt; TRWBW&gt; mopped: well have you seen regular expressions before? if you               program they are popular for text processing, so you might have 16:05 &lt; kilimanjaro&gt; well I'm most curious about the link between regular                      expressions and dfa, it gets a little bit less mathy but I                      have heard people say things like &quot;programming language x                      has regular expressions but not in the sense of formal                      languages&quot;, etc 16:05 &lt; kilimanjaro&gt; so it would be interesting 16:05 &lt; mopped&gt; yeah I have 16:06 &lt; TRWBW&gt; okay well to summarize formally, you are talking about building                up languages starting with some basic languages and then combining them with operations 16:07 &lt; TRWBW&gt; the basic languages are the empty language, the language containing the empty string (those are different), and languages containing a single string of a single symbol 16:07 &lt; TRWBW&gt; so {},{e}, and the languages {s} for all symbols s in your alphabet E 16:07 &lt; TRWBW&gt; mopped: does that terminology work for you? 16:08 &lt; mopped&gt; yeah 16:08 &lt; TRWBW&gt; the operations are union, i'll use | for that, concatenation, i'll just. for that or just write them next to each other, and the star operator * 16:09 &lt; TRWBW&gt; union is just set union, L=L1|L2 means L={w: (w in L1) or (w in               L2)}. so one or the other (or both) 16:10 &lt; TRWBW&gt; concatenation for strings means one starts at the end of the other, so abcd.efg=abcdef, just join them. for languages you join each string from the first with every one from the second 16:10 &lt; TRWBW&gt; so {e,a,abcd}.{cc,b}={cc,acc,abcdcc,b,ab,abcdb} 16:11 &lt; TRWBW&gt; formally L=L1.L2 means L={w: exists strings w1 in L1 and w2 in               L2 such that w=w1.w2} 16:12 &lt; TRWBW&gt; L* is a bit trickier if you haven't seen it. you can define it               as L*={}|L|(L.L)|(L.L.L)|..... 16:12 &lt; TRWBW&gt; ugh typo 16:12 &lt; TRWBW&gt; L* is a bit trickier if you haven't seen it. you can define it               as L*={e}|L|(L.L)|(L.L.L)|..... 16:13 &lt; TRWBW&gt; so for L={a,bb}, L*={e,a,bb,aa,abb,bba,bbbb,...} any string you can get by concatenating strings from L onto each other, including the empty string 16:14 &lt; TRWBW&gt; mopped: so if i wrote: {a}((b,c}*|{aa}) would you be able to               describe that language? 16:14 &lt; TRWBW&gt; typo again 16:14 &lt; TRWBW&gt; mopped: so if i wrote: {a}({b,c}*|{aa}) would you be able to                describe that language? 16:15 &lt; mopped&gt; hmm 16:15 &lt; TRWBW&gt; kilimanjaro: should check with you too, but assumed you had seen                that before 16:16 &lt; kilimanjaro&gt; TRWBW, yea I am familiar with it, just refraining from                      answer so as to not spoil the fun for others 16:17 &lt; mopped&gt; {a, b, c, bc, cb, bbc, bcb, cbc, ccb, .., aa}? 16:17 &lt; mopped&gt; describing it eh, not so much 16:18 &lt; kilimanjaro&gt; mopped, maybe try parsing it into sort of a prefix form,                      you got (concat {a} (or (* {b,c}) {aa})), so from the                      outermost operation you know at the very least that every string starts with an a 16:19 &lt; mopped&gt; ahhh yeah 16:20 &lt; TRWBW&gt; mopped: it ends up being {a,ab,ac,abb,abc,acb,acc,...,aaa} 16:20 &lt; mopped&gt; so every string starts with a, ends with aa and then has b and c inbetween 16:20 &lt; mopped&gt; oh :| 16:20 &lt; TRWBW&gt; mopped: no, the |, the or, means it either ends with b's and c's               mixed however, or it ends with aa 16:20 &lt; mopped&gt; aha 16:20 &lt; mopped&gt; i understand 16:21 &lt; TRWBW&gt; i'm gonna simplify the notation a bit, like most people do, and start using a to mean, when it's in the context of a language {a}. same with e, if it comes up. 16:21 &lt; TRWBW&gt; so that would be a({b,c}*|aa) 16:22 &lt; TRWBW&gt; mopped: here's another one, just cause you want to get a sense before we move on, how about (a{a,b)*a)|(b{a,b}*b) 16:23 &lt; mopped&gt; everything starts/ends with the same letter? aa, bb, aba, bab,                etc? 16:23 &lt; TRWBW&gt; mopped: yup, which is the same as  16:24 &lt; TRWBW&gt; so regular expressions and DFA's both are ways of defining a                language. kinda flip sides, one gives you a way to construct                strings, one gives you a way to check them. 16:24 &lt; TRWBW&gt; given a DFA it may not be immediately obvious how to generate                all the strings it accepts, but with a regular expression it is 16:25 &lt; TRWBW&gt; and vice versa, given a regular expression it may not be obvious                whether it generates a given string, but with a DFA easy to check 16:26 &lt; TRWBW&gt; it turns out, perhaps surprisingly, that as methods of defining languages they are equivalent. for any regular expression there is a DFA, and for any DFA there is a regular expression, so that the DFA accepts precisely the same language as the regular expression generates 16:27 &lt; TRWBW&gt; so i'll give a proof of one way, that for any DFA, there is a               regular expression that generates the languages it accepts. it               will actually be a proof in the sense of a method (although not                a very effecient one) that you could apply to generate the regular expression from the DFA 16:28 &lt; TRWBW&gt; mopped: still with me? at least in terms of understanding what the issue is and what i'm going to be attempting to prove? 16:28 &lt; mopped&gt; yeah 16:29 &lt; TRWBW&gt; okay, are you better with me talking about in terms of diagrams, or in terms of the transition function t(q,w) i defined formally before? 16:32 &lt; TRWBW&gt; kilimanjaro: k, gonna start the actual proof now 16:33 &lt; kilimanjaro&gt; rgr 16:34 &lt; TRWBW&gt; so given a machine M, and its language L(M), i'm going to               generate a regular expression for L(M) in terms of other languages, actually a whole bunch of them, in such a way that they all correspond to L(M') where M' is some minor modification of M 16:34 &lt; TRWBW&gt; more importantly i'm going to do it so that M' always has one fewer states than M 16:34 &lt; TRWBW&gt; so it's an inductive proof, or if you implemented it, a               recursive algorithm 16:35 &lt; kilimanjaro&gt; ok 16:35 &lt; mopped&gt; sure 16:36 &lt; TRWBW&gt; to simplify life later on, i'm going to assume that all the DFA's have (at least one) dead-end state. i'll define a dead end state as one with the following properties: 1) it's not the               start state, 2) its not an accepting state, 3) all transitions                from it lead back to it 16:36 &lt; TRWBW&gt; so when you get to a dead end state, no matter what happens                later the DFA will never accept the string being processed 16:37 &lt; TRWBW&gt; given any DFA you can always add a dead end state without                affecting anything, just draw a circle disconnected from the                rest of the DFA and make all its transitiosn loop back to it 16:37 &lt; TRWBW&gt; i'll give a DFA so i can talk about it, sec  16:39 &lt; TRWBW&gt; that's actually a DFA that accepts any string that is either e,                the empty string, or that starts with a different {a,b} then it                ends. if you want to think about the language. doesn't matter, just there to be able to describe stuff. 16:39 &lt; TRWBW&gt; okay, now i'll describe the kinda &quot;surgery&quot; i meant 16:41 &lt; TRWBW&gt; call that M, by definition L(M)={w : t(q0,w) in {q0,qa_b,qb_a}}. say you wanted L={w: w in L(M) and w does not visit node qa_a} 16:41 &lt; TRWBW&gt; call that M, by definition L(M)={w : t(q0,w) in {q0,qa_b,qb_a}}. say you wanted L={w: w in L(M) and w does not visit node qa_a 16:42 &lt; TRWBW&gt; ugh i'll just say it words. when you start from a state q, and apply a string w, the DFA goes through a sequence of states to               get to some final state p 16:42 &lt; TRWBW&gt; you can talk about whether that sequence of states visits some other state r or not 16:43 &lt; TRWBW&gt; so that gives you another language you can define based on a DFA M, the ones that are accepted (from start state) without visiting some forbidden state q 16:43 &lt; TRWBW&gt; that make more sense? 16:43 &lt; kilimanjaro&gt; yes 16:44 &lt; TRWBW&gt; anyways, you can modify M to get an M' in the following way. every transition that used to go to q, instead send it to the dead state 16:44 &lt; kilimanjaro&gt; so this accepts the new restricted language? 16:44 &lt; TRWBW&gt; then L(M') is precisely that of M with state q forbidden 16:44 &lt; TRWBW&gt; yup 16:45 &lt; TRWBW&gt; so another modification, i defined L(M) in terms of {w: t(q0,w) in A}. instead say you want all the strings, given two states q,p such that t(q,w)=p 16:46 &lt; TRWBW&gt; you can do that by generating an M' from M where you change the start state to q, and the accepting set to just be {p} 16:47 &lt; mopped&gt; when you say in A and in {q0,q_ab} etc, does that just mean t(q0, w) = A, ie a string that takes it from start state to an                acceptance state? 16:47 &lt; TRWBW&gt; now neither of these modifications change the set of states, so               the size of M' (measured as number of states) stays the same 16:47 &lt; TRWBW&gt; yeah except its not really =, it's membership. you can have more than one accepting, like the one i gave last does  16:47 &lt; mopped&gt; ok, so t(q0, w) = a member of A, got you 16:48 &lt; TRWBW&gt; okay, gonna try to go a bit faster, stop me if i stop making sense 16:49 &lt; TRWBW&gt; the overall goal is to convert a DFA into a regular expression. i'm going to just prove exisistance by induction on the size of               the machine, defined as number of states. but it will be               constructive, so you could make that an algorithm if you wanted. 16:49 &lt; TRWBW&gt; i'm assuming all machines have a dead state. that's not a               limitation, because you can add one to M to get M', then L(M)=L(M') and if L(M') has a regular expression, so does L(M) 16:50 &lt; TRWBW&gt; first the base cases. they will actually be all DFA's with two nodes, since under this restriction the smallest a DFA can get is a start state and a dead state. 16:50 &lt; TRWBW&gt; so for such a machine, first see if the start state is               accepting. if it isn't your machine has no accepting states and its language is {}, the empty language 16:51 &lt; TRWBW&gt; if the start state is accepting, then look at the symbols in               your alphabet. for each s, its transition either goes to the dead state or back to the start state (which is accepting) 16:51 &lt; TRWBW&gt; so in that case, L(M)=(s1|s2|..)* where s1,s2,.. are the symbols whose transitions loop back to the start state. 16:52 &lt; TRWBW&gt; so with me on the base cases being covered? 16:53 &lt; kilimanjaro&gt; yea 16:54 &lt; mopped&gt; yes 16:55 &lt; TRWBW&gt; okay, not for any machine M, i'm going to start by splitting L(M)=(L_loop)*.L_finish, where i'll define L_loop as the language of strings that, when you start at the start state, return to the start state (once, multiple loops are taken care               of by the * after), and L_finish is the language of strings that, again starting from the start state, are accepted *without* ever returning to the start state 16:55 &lt; TRWBW&gt; that's the key idea, so i'll pause again to give you time to               read it 16:55 &lt; TRWBW&gt; is it clear that those two languages are well defined, and that L(M)=(L_loop)*.L_finish ? 16:56 &lt; kilimanjaro&gt; yes 16:57 &lt; mopped&gt; yes 16:58 &lt; TRWBW&gt; you can get a machine M' for L_finish by the surgery of              forbidding the start state. that is, take all transitions that go back to the start state and instead reroute them to the dead state. that doesn't help immediately though, because you can't               induct yet, since M' has the same number of states as M had 16:59 &lt; TRWBW&gt; the next idea is that you break it down based on what the first symbol of a string accepted by M' is. 17:00 &lt; TRWBW&gt; first you have to special case the empty string. so if the start state is accepting, then e is in L_finish, and the regular expression for that case is {e} 17:01 &lt; TRWBW&gt; non-empty strings must start with some symbol s. for each s, you do the following, let q be the state that you get to from start by s. q isn't the start state, since we've surgically removed transitions to start 17:01 &lt; TRWBW&gt; now make M'' by make q the new start state, and removing the old start state from the state set. 17:02 &lt; TRWBW&gt; i'm claiming s.L(M'') is the language {w in L_finish: w starts with s} 17:02 &lt; TRWBW&gt; ..pauses.. 17:03 &lt; kilimanjaro&gt; sounds right to me 17:04 &lt; TRWBW&gt; in that last DFA say, you have a transition on a from q0 to qa_a, so you would make qa_a the start state, remove q0, and then the machine accepts exactly those strings that reach an accepting state from qa_a without going through the start state 17:04 &lt; TRWBW&gt; (that was just an example of what i was saying, not something               new) 17:05 &lt; TRWBW&gt; okay, so by induction L(M'') has a regular expression, call that R_s since it depended on s. that is because by removing a state, it's now smaller, and we are inducting on that. 17:06 &lt; TRWBW&gt; now L_final=R_s1|R_s2|... where s1,s2,.. is the whole alphabet (but still finite), with |{e} stuck in too if machine accepts the empty string (i.e. original machine had an accepting start               state) 17:08 &lt; TRWBW&gt; the construction for L_loop is very similar. remember L_loop={w | t(q0,w)=q0, and t(q0,p)!=q0 for any prefix p, 0&lt;|p|&lt;|w|} 17:09 &lt; kilimanjaro&gt; yup 17:09 &lt; TRWBW&gt; so cases of strings of length 1 are self loops on the start state, for any of them you have the regular expression {s}, where t(q0,s)=q0 17:11 &lt; TRWBW&gt; for strings of greater length, you have some a first symbol of               the string, s1, and last symbol, s2, and states q1 and q2, with t(q0,s1)=q1, and t(q2,s2)=q0, and the middle part of the string transitions from q1 to q2 17:13 &lt; TRWBW&gt; for any such pairs q1 and q2, where q0 has a transition s1 to               q1, and q2 has a transition s2 to q0, you get s1.L(M').s2, where M' is the machine you get by the surgery of making q1 the start state, q2 the accepting state, and removing the old start state 17:14 &lt; TRWBW&gt; again, that's smaller and you can induct, again, you | together all possibilities 17:14 &lt; TRWBW&gt; qed i guess 17:15 &lt; kilimanjaro&gt; cool beans 17:15 &lt; kilimanjaro&gt; and the map from regexp -&gt; dfa seems like it would be an                     easier argument 17:16 &lt; |Steve|&gt; kilimanjaro: It's actually not quite so easy to do it directly. 17:17 &lt; kilimanjaro&gt; what is the general argument? 17:17 &lt; TRWBW&gt; kilimanjaro: really you want to do NFA-&gt;DFA first, and then it               becomes very easy to do regexp-&gt;NFA-&gt;DFA 17:17 &lt; |Steve|&gt; R1R2 is easy, R1|R2 is easy, the base cases are easy, the only one that isn't so easy is R1*. 17:17 &lt; kilimanjaro&gt; what is NFA? 17:18 &lt; TRWBW&gt; kilimanjaro: non-deterministic. i think mark-t is going to do               that one. 17:18 &lt; kilimanjaro&gt; ahh just looked it up, instead of the transition giving you a state it gives you a set of states 17:19 &lt; TRWBW&gt; yeah 17:19 &lt; TRWBW&gt; anyways, i think that's enough DFA for one seminar. gonna go get some dinner ;) 17:20 &lt; kilimanjaro&gt; thanks for the talk 17:20 &lt; mopped&gt; yeah, thanks 17:20 &lt; TRWBW&gt; np ;) 17:20 &lt; |Steve|&gt; Good talk. 17:20 &lt; TRWBW&gt; thx