Copyright (C) 2015, H. Conrad Cunningham
Acknowledgements: MS student Clay McLeod assisted in preparation of these notes. These lecture notes are for use with Chapter 3 of the textbook: Peter Linz. Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett Learning, 2012. The terminology and notation used in these notes are similar to those used in the Linz textbook. This document uses several figures from the Linz textbook.
Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of December 2015 seems to be a recent version of Firefox from Mozilla.
Regular languages
Thus we will examine other notations for representing regular languages.
We define the syntax (or structure) of regular expressions with an inductive definition.
Linz Definition 3.1 (Regular Expression): Let be a given alphabet. Then:
, , and are all regular expressions. These are called primitive regular expressions.
If and are regular expressions, then , , , and are also regular expressions.
A string is a regular expression if and only if it can be derived from the primitive regular expressions by a finite number of applications of the rules in (2).
We use the the regular expression operators as follows:
represents the union of two regular expressions.
is the concatenation of two regular expressions.
is the star closure of a regular expression.
is the same as regular expression . It is parenthesized to express the order of operations explicitly.
For example, consider regular expression over the alphabet . Note the use of parentheses.
, , and are primitive regular expressions.
is a concatenation of regular expressions and .
is union of regular expressions and .
is the star-closure of regular expression .
As with arithmetic expressions, precedence rules and conventions can be used to relax the need for parentheses.
Star-closure () has a higher precedence (i.e., priority or binding power) than concatenation (). That is, is equal to , not .
Concatenation () higher precedence than union (). That is, is equal to , not . And, transitively, star-closure has a higher precedence than concatenation.
Concatenation operator () can usually be omitted. That is, means .
A string is not a regular expression. It cannot be generated using the above definition (as augmented by the precedence rules and convention).
But what do we "mean" by a regular expression? That is, what is its semantics.
In particular, what languages do regular expressions describe?
Consider the regular expression from above. As implied by the names for the operators, we intend this regular expression to represent the language which is .
We again give an inductive definition for the language described by a regular expression. It must consider all the cases given in the definition of regular expression itself.
Linz Definition 3.2: The language denoted by any regular expression is defined (inductively) by the following rules.
Base cases:
Inductive cases: If and are regular expressions, then
Show the language in set notation.
{ Rule 5 } | |
{ Rule 7 } | |
{ Rule 4 } | |
{ definition of star-closure of languages } | |
{ definition of concatenation of languages } | |
Consider the languages for the following regular expressions.
Consider the regular expression .
This expression denotes the set of all strings with an even number of 's followed by an odd number of 's.
In set notation, .
Show regular expressions on the alphabet for the following languages.
Regular expressions provide a convenient and concise notation for describing regular languages.
Linz Theorem 3.1 (NFAs for Regular Expressions): Let be a regular expression. Then there exists some nondeterministic finite accepter (nfa) that accepts . Consequently, is a regular language.
Proof Sketch: Show that any regular expression generated from the inductive definition corresponds to an nfa. Here we proceed informally.
Linz Figure 3.1 diagrammatically demonstrates that there are nfas that correspond to the primitive regular expressions.
Linz Figure 3.2 shows a general scheme for a nondeterministic finite accepter (nfa) that accepts , with an initial state and one final state.
Linz Figure 3.3 gives an nfa for . Note the use of -transitions to connect the two machines to the new initial and final states.
Linz Figure 3.4 shows an nfa for . Again note the use of -transitions to connect the two machines to the new initial and final states.
Linz Figure 3.5 shows an nfa for . Note the use of -transitions to represent zero-or-more repetitions of the machine and to connect it to the new initial and final states.
Thus, Linz Figures 3.3 to 3.5 illustrate composing nfas for any regular expression from the nfas for its subexpressions. Of course, the initial and final states of components are replaced by the initial and final states of the composite nfa.
Show an nfa that accepts .
Linz Figure 3.6, part (a), shows that accepts . Part (b) shows that accepts .
Linz Figure 3.7 shows an nfa that accepts .
The construction in the proof sketch and example above suggest an algorithm for converting regular expressions to nfas.
This algorithm is adapted from pages 273-4 of the book: James L. Hein, Theory of Computation: An Introduction, Jones and Bartlett, 1996.
The diagrams in this section are from the Hein book, which uses a slightly different notation than the Linz book. In particular, these diagrams use capital letters for the expressions.
Algorithm to convert a regular expression to an nfa
Start with a "machine" with a single start state, a single final state, and a connecting edge labeled with the regular expression.
While there are edges labeled with regular expressions other than elements of the alphabet or apply any of the following rules that are applicable:
If an edge is labeled with , then remove the edge.
If an edge is labeled with , then replace the edge with two edges labeled with and connecting the same source and destination states.
If an edge is labeled with , the replace the edge with an edge labeled connecting the source to a new intermediate state, followed by an edge labeled connecting the intermediate state to the destination.
If an edge is labeled with , then replace the edge with a new intermediate state with a self-loop labeled with edges labeled connecting the source to the intermediate state and the intermediate state to the destination.
End of Algorithm
Rule 2 in the above algorithm can result in an unbounded number of edges originating at the same state. This makes the algorithm difficult to implement. To remedy this situation, replace Rule 2 as follows.
If an edge is labeled with , then replace the edge with subgraphs for each of and . The subgraph for consists of with a new source state connected to a new destination state with an edge labeled . Add edges labeled to connect the original source state to the new source state and the original destination state to the new destination state. Proceed similarly for .
This example is from page 275 of the Hein textbook cited above.
Construct an nfa for .
Start with a the two-state initial diagram.
Next, apply Rule 2 to .
Next, apply Rule 4 to .
Finally, apply Rule 3 to .
The construction in the proof sketch and example above suggest an algorithm for converting finita automata to regular expressions.
This algorithm is adapted from page 276 of the book: James L. Hein, Theory of Computation: An Introduction, Jones and Bartlett, 1996.
Algorithm to convert a finite automaton to a regular expression
Begin with a dfba or an nfa.
Create a new start state and connect this to the original start state with an edge labeled .
Create a new final state and connect the original final states to this state by edges labeled .
For each pair of states and that has more than one edge connecting them, replace all the edges with the regular expression formed using union () to combine the labels on the previous edges.
Construct a sequence of new machines by eliminating one state at a time until the only states remaining are and . To eliminate some state , construct a new machine as follows.
Let old) represent the label on the edge on the current (i.e., old) machine.
If there is no edge , then set old.
For every pair of edges and , where and , calculate a new edge label new as follows:
new old old old old
For all other edges , where and , set:
new old.
The states of the new machine are the states of the old machine with state eliminated. The edges of the new machine are the where the new has been calculated.
After eliminating all states except and , the regular expression is the label on the one edge remaining.
End of Algorithm
This example is from pages 277-8 of the Hein textbook cited above.
Consider the following dfa.
After applying Rule 1 (new start state), Rule 2 (new final state), and Rule 3 (create union), we get the following machine.
We can eliminate state 2 readily because it is trap state. That is, there is no path through 2 between edges adjacent to 2, so new old for any states and . The resulting machine is as follows.
To eliminate state 0, we construct a new edge that is labeled as follows:
Thus, we can eliminate state 0 and its edges and add a new edge labeled .
We can eliminate state 1 by adding a new edge labeled as follows
Thus the regular expression is .
This example is from pages 277-8 of the Hein textbook cited above.
Consider the following dfa. Verify that it corresponds to the regular expression .