Notes on Models of Computation
Chapter 3
H. Conrad Cunningham
06 April 2022
Browser Advisory: The HTML version of this textbook requires a browser that supports the display of MathML. A good choice as of April 2022 is a recent version of Firefox from Mozilla.
Note: These notes were written primarily to accompany my use of Chapter 1 of the Linz textbook An Introduction to Formal Languages and Automata [[1].
Regular languages
Thus we will examine other notations for representing regular languages.
We define the syntax (or structure) of regular expressions with an inductive definition.
Linz Definition 3.1 (Regular Expression): Let be a given alphabet. Then:
, , and are all regular expressions. These are called primitive regular expressions.
If and are regular expressions, then , , , and are also regular expressions.
A string is a regular expression if and only if it can be derived from the primitive regular expressions by a finite number of applications of the rules in (2).
We use the the regular expression operators as follows:
represents the union of two regular expressions.
is the concatenation of two regular expressions.
is the star closure of a regular expression.
is the same as regular expression . It is parenthesized to express the order of operations explicitly.
For example, consider regular expression over the alphabet . Note the use of parentheses.
, , and are primitive regular expressions.
is a concatenation of regular expressions and .
is union of regular expressions and .
is the star-closure of regular expression .
As with arithmetic expressions, precedence rules and conventions can be used to relax the need for parentheses.
Star-closure () has a higher precedence (i.e., priority or binding power) than concatenation (). That is, is equal to , not .
Concatenation () higher precedence than union (). That is, is equal to , not . And, transitively, star-closure has a higher precedence than concatenation.
Concatenation operator () can usually be omitted. That is, means .
A string is not a regular expression. It cannot be generated using the above definition (as augmented by the precedence rules and convention).
But what do we “mean” by a regular expression? That is, what is its semantics.
In particular, what languages do regular expressions describe?
Consider the regular expression from above. As implied by the names for the operators, we intend this regular expression to represent the language which is .
We again give an inductive definition for the language described by a regular expression. It must consider all the cases given in the definition of regular expression itself.
Linz Definition 3.2: The language denoted by any regular expression is defined (inductively) by the following rules.
Base cases:
Inductive cases: If and are regular expressions, then
Show the language in set notation.
{ Rule 5 } | |
{ Rule 7 } | |
{ Rule 4 } | |
{ definition of star-closure of languages } | |
{ definition of concatenation of languages } | |
Consider the languages for the following regular expressions.
Consider the regular expression .
This expression denotes the set of all strings with an even number of ’s followed by an odd number of ’s.
In set notation, .
Show regular expressions on the alphabet for the following languages.
Regular expressions provide a convenient and concise notation for describing regular languages.
Linz Theorem 3.1 (NFAs for Regular Expressions): Let be a regular expression. Then there exists some nondeterministic finite accepter (nfa) that accepts . Consequently, is a regular language.
Proof Sketch: Show that any regular expression generated from the inductive definition corresponds to an nfa. Here we proceed informally.
Linz Figure 3.1 diagrammatically demonstrates that there are nfas that correspond to the primitive regular expressions.
Linz Figure 3.2 shows a general scheme for a nondeterministic finite accepter (nfa) that accepts , with an initial state and one final state.
Linz Figure 3.3 gives an nfa for . Note the use of -transitions to connect the two machines to the new initial and final states.
Linz Figure 3.4 shows an nfa for . Again note the use of -transitions to connect the two machines to the new initial and final states.
Linz Figure 3.5 shows an nfa for . Note the use of -transitions to represent zero-or-more repetitions of the machine and to connect it to the new initial and final states.
Thus, Linz Figures 3.3 to 3.5 illustrate composing nfas for any regular expression from the nfas for its subexpressions. Of course, the initial and final states of components are replaced by the initial and final states of the composite nfa.
Show an nfa that accepts .
Linz Figure 3.6, part (a), shows that accepts . Part (b) shows that accepts .
Linz Figure 3.7 shows an nfa that accepts .
The construction in the proof sketch and example above suggest an algorithm for converting regular expressions to nfas.
This algorithm is adapted from pages 273-4 of the book: James L. Hein, Theory of Computation: An Introduction, Jones and Bartlett, 1996.
The diagrams in this section are from the Hein book, which uses a slightly different notation than the Linz book. In particular, these diagrams use capital letters for the expressions.
Algorithm to convert a regular expression to an nfa
Start with a “machine” with a single start state, a single final state, and a connecting edge labeled with the regular expression.
While there are edges labeled with regular expressions other than elements of the alphabet or apply any of the following rules that are applicable:
If an edge is labeled with , then remove the edge.
If an edge is labeled with , then replace the edge with two edges labeled with and connecting the same source and destination states.
If an edge is labeled with , the replace the edge with an edge labeled connecting the source to a new intermediate state, followed by an edge labeled connecting the intermediate state to the destination.
If an edge is labeled with , then replace the edge with a new intermediate state with a self-loop labeled with edges labeled connecting the source to the intermediate state and the intermediate state to the destination.
End of Algorithm
Rule 2 in the above algorithm can result in an unbounded number of edges originating at the same state. This makes the algorithm difficult to implement. To remedy this situation, replace Rule 2 as follows.
If an edge is labeled with , then replace the edge with subgraphs for each of and . The subgraph for consists of with a new source state connected to a new destination state with an edge labeled . Add edges labeled to connect the original source state to the new source state and the original destination state to the new destination state. Proceed similarly for .
This example is from page 275 of the Hein textbook cited above.
Construct an nfa for .
Start with a the two-state initial diagram.
Next, apply Rule 2 to .
Next, apply Rule 4 to .
Finally, apply Rule 3 to .
The construction in the proof sketch and example above suggest an algorithm for converting finita automata to regular expressions.
This algorithm is adapted from page 276 of the book: James L. Hein, Theory of Computation: An Introduction, Jones and Bartlett, 1996.
Algorithm to convert a finite automaton to a regular expression
Begin with a dfba or an nfa.
Create a new start state and connect this to the original start state with an edge labeled .
Create a new final state and connect the original final states to this state by edges labeled .
For each pair of states and that has more than one edge connecting them, replace all the edges with the regular expression formed using union () to combine the labels on the previous edges.
Construct a sequence of new machines by eliminating one state at a time until the only states remaining are and . To eliminate some state , construct a new machine as follows.
Let old) represent the label on the edge on the current (i.e., old) machine.
If there is no edge , then set old.
For every pair of edges and , where and , calculate a new edge label new as follows:
new old old old old
For all other edges , where and , set:
new old.
The states of the new machine are the states of the old machine with state eliminated. The edges of the new machine are the where the new has been calculated.
After eliminating all states except and , the regular expression is the label on the one edge remaining.
End of Algorithm
This example is from pages 277-8 of the Hein textbook cited above.
Consider the following dfa.
After applying Rule 1 (new start state), Rule 2 (new final state), and Rule 3 (create union), we get the following machine.
We can eliminate state 2 readily because it is trap state. That is, there is no path through 2 between edges adjacent to 2, so new old for any states and . The resulting machine is as follows.
To eliminate state 0, we construct a new edge that is labeled as follows:
Thus, we can eliminate state 0 and its edges and add a new edge labeled .
We can eliminate state 1 by adding a new edge labeled as follows
Thus the regular expression is .
This example is from pages 277-8 of the Hein textbook cited above.
Consider the following dfa. Verify that it corresponds to the regular expression .
Applying Rules 1 and 2 (adding new start and final states), we get the following machine.
To eliminate state 0, we add the following new edges.
new
new
We can eliminate either state 2 or state 3 next. Let’s choose 3. Thus we create the following new edges.
new
new
Now we eliminate state 2 and thus create the new edges.
new
new
Finally, we remove state 1 by creating a new edge.
Pascal integer constants
Regular expression where
Pattern matching
Program for Pattern Matching
We can convert a regular expression to an equivalent nfa, the nfa to a dfa, and the dfa to a transition table. We can use the transition table to drive a program for pattern matching.
For a more effiicent program, we can apply the state reduction algorithm to the dfa before converting to a transition table. Linz section 2.4, which we did not cover this semester, discusses this algorithm.
We have studied two ways of describing regular languages–finite automata (i.e. dfas, nfas) and regular expressions. Here we examine a third way–regular grammars.
Linz Definition 3.3 (Right-Linear Grammar): A grammar is said to be right-linear if all productions are of one of the forms
where and .
Similarly, a grammar is said to be left-linear if all productions are of the form or .
A regular grammar is one that is either right-linear or left-linear.
The grammar , with given as
is right-linear.
The grammar , with productions
is left linear. Both and are regular grammars.
The grammar with productions
is not regular.
Although every production is either in right-linear or left-linear form, the grammar itself is neither right-linear nor left-linear, and therefore is not regular. The grammar is an example of a linear grammar.
Definition (Linear Grammar): A linear grammar is a grammar in which at most one variable can appear on the right side of any production.
Linz Theorem 3.3 (Regular Languages for Right-Linear Grammars): Let be a right-linear grammar. Then is a regular language.
Strategy: Because a regular language is any language accepted by a dfa or nfa, we seek to construct an nfa that simulates the derivations of the right-linear grammar.
The algorithm below incorporates this idea. It is based on the algorithm given on page 314 of the book: James L. Hein, Theory of Computation: An Introduction, Jones and Bartlett, 1996.
Algorithm to convert a regular grammar to an nfa
Start with a right-linear grammar and construct an equivalent nfa. We label the nfa’s states primarily with variables from the grammar and label edges with terminals in the grammar or .
If necessary, transform the grammar so that all productions have the form or , where is either a terminal in the grammar or .
Label the start state of the nfa with the start symbol of the grammar.
For each production , add a state transition (edge) from a state to a state with the edge labeled with the symbol .
For each production , add a state transition (edge) from a state to a state with the edge labeled with .
If there exist productions of the form , then add a single new state symbol . For each production of the form , add a state transition from to labeled with symbol .
The final states of the nfa are plus all such there is a production of the form .
End of algorithm
Construct an nfa for the following regular grammar :
The grammar is in the correct form, so step 1 of the grammar is not applicable. The following sequence of diagrams shows the use of steps 2, 3 (three times), 5, and 6 of the algorithm. Step 4 is not applicable to this grammar.
Note that .
This is similar to the example in the Linz textbook, but we apply the algorithm as stated above.
Construct an nfa for the regular grammar :
First, let’s transform the grammar according to step 1 of the regular grammar to nfa algorithm above.
Applying steps 2, 3 (three times), 5, and 6 of algorithm as show below, we construct the following nfa. Step 4 was not applicable in this problem.
Note that .
Linz Theorem 3.4 (Right-Linear Grammars for Regular Languages): If is a regular language on the alphabet , then there exists a right-linear grammar such that .
Strategy: Reverse the construction of an nfa from a regular grammar given above.
The algorithm below incorporates this idea. It is based on the algorithm given on page 312 of the Hein textbook cited above.
Algorithm to convert an nfa to a regular grammar
Start with an nfa and construct a regular grammar.
Relabel the states of the nfa with capital letters.
Make the start state label the start symbol for the grammar.
For each transition (edge) from a state to a state labeled with an alphabetic symbol , add a production to the grammar.
For each transition (edge) from a state to a state labeled with , add a production to the grammar.
For each final state labeled , add a production to the grammar.
End of algorithm
Consider the following nfa (adapted from the Hein textbook page 313). (The Hein book uses instead of to label silent moves and empty strings.)
We apply the steps of the algorithm as follows.
The nfa states are already labeled as specified.
Choose as start symbol for grammar.
Add the following productions:
Add the following production:
Add the following production:
So, combining the above productions, we get the final grammar:
Linz Theorem 3.5 (Equivalence of Regular Languages and Left-Linear Grammars): A language is regular if and only if there exists a left-linear grammar such that .
Linz Theorem 3.6(Equivalence of Regular Languages and Right-Linear Grammars): A language is regular if and only if there exists a regular grammar such that .
The four theorems from this section enable us to convert back and forth among finite automata and regular languages as shown in Linz Figure 3.19. Remember that Linz Theorem 2.2 enabled us to translate from nfa to dfa.