Copyright (C) 2015, H. Conrad Cunningham

Acknowledgements: MS student Clay McLeod assisted in preparation of these notes. These lecture notes are for use with Chapter 3 of the textbook: Peter Linz. Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett Learning, 2012. The terminology and notation used in these notes are similar to those used in the Linz textbook. This document uses several figures from the Linz textbook.

Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of December 2015 seems to be a recent version of Firefox from Mozilla.

Introduction

Regular languages

Thus we will examine other notations for representing regular languages.

3.1 Regular Expressions

3.1.1 Syntax

We define the syntax (or structure) of regular expressions with an inductive definition.

Linz Definition 3.1 (Regular Expression): Let Σ\Sigma be a given alphabet. Then:

  1. \emptyset, λ\lambda, and aΣa \in \Sigma are all regular expressions. These are called primitive regular expressions.

  2. If r1r_{1} and r2r_{2} are regular expressions, then r1+r2r_{1} + r_{2}, r1r2r_{1} \cdot r_{2}, r1*r_{1}^{*}, and (r1)(r_{1}) are also regular expressions.

  3. A string is a regular expression if and only if it can be derived from the primitive regular expressions by a finite number of applications of the rules in (2).

We use the the regular expression operators as follows:

For example, consider regular expression (a+(bc))*(a + (b \cdot c))^{*} over the alphabet {a,b,c}\{a,b,c\}. Note the use of parentheses.

As with arithmetic expressions, precedence rules and conventions can be used to relax the need for parentheses.

A string (a+b+)(a + b +) is not a regular expression. It cannot be generated using the above definition (as augmented by the precedence rules and convention).

3.1.2 Languages Associated with Regular Expressions

But what do we "mean" by a regular expression? That is, what is its semantics.

In particular, what languages do regular expressions describe?

Consider the regular expression (a+(bc))*(a + (b \cdot c))^{*} from above. As implied by the names for the operators, we intend this regular expression to represent the language ({a}{bc})*(\{a\} \cup \{bc\})^{*} which is {λ,a,bc,aa,abc,bca,bcbc,aaa,aabc,bcaa,}\{\lambda, a, bc, aa, abc, bca, bcbc, aaa, aabc,bcaa, \ldots \}.

We again give an inductive definition for the language described by a regular expression. It must consider all the cases given in the definition of regular expression itself.

Linz Definition 3.2: The language L(r)L(r) denoted by any regular expression rr is defined (inductively) by the following rules.

Base cases:

  1. \emptyset is a regular expression denoting the empty set.
  2. λ\lambda is a regular expression denoting {λ}\{ \lambda \}.
  3. For every aΣa \in \Sigma, aa is a regular expression denoting {a}\{ a \}.

Inductive cases: If r1r_{1} and r2r_{2} are regular expressions, then

  1. L(r1+r2)=L(r1)L(r2)L(r_{1} + r_{2}) = L(r_{1}) \cup L(r_{2})
  2. L(r1r2)=L(r1)L(r2)L(r_{1} \cdot r_{2}) = L(r_{1})L(r_{2})
  3. L((r1))=L(r1)L((r_{1})) = L(r_{1})
  4. L(r1*)=(L(r1))*L(r_{1}^{*}) = (L(r_{1}))^{*}

3.1.3 Linz Example 3.2

Show the language L(a*(a+b))L(a^{*} \cdot (a + b)) in set notation.

L(a*(a+b))L(a^{*} \cdot (a + b))
== { Rule 5 }
L(a*)L(a+b)L(a^{*})L(a+b)
== { Rule 7 }
(L(a))*L(a+b)(L(a))^{*}L(a+b)
== { Rule 4 }
(L(a))*(L(a)L(b))(L(a))^{*}(L(a) \cup L(b))
== { definition of star-closure of languages }
{λ,a,aa,aaa,}{a,b}\{\lambda, a, aa, aaa, \ldots \}\{a, b\}
== { definition of concatenation of languages }
{a,aa,aaa,...,b,ab,aab,aaab,}\{a, aa, aaa, ..., b, ab, aab, aaab, \ldots \}

3.1.4 Examples of Languages for Regular Expressions

Consider the languages for the following regular expressions.

L(a*ba*b(a+b)*)L(a^{*} \cdot b \cdot a^{*} \cdot b \cdot (a + b)^{*})
== {a}*{b}{a}*{b}{a,b}*\{a\}^{*} \{b\} \{a\}^{*} \{b\} \{a,b\}^{*}
== {w:w{a,b}*,nb(w)2}\{ w : w \in \{ a, b \}^{*}, n_{b}(w) \geq 2 \}
L((a+b)*ba*ba*)L((a + b)^{*} \cdot b \cdot a^{*} \cdot b \cdot a^{*})
== {a,b}*{b}{a}*{b}{a}*\{a,b\}^{*} \{b\} \{a\}^{*} \{b\} \{a\}^{*}
== same as above
L((a+b)*b(a+b)*b(a+b)*)L((a+b)^{*} \cdot b \cdot (a + b)^{*} \cdot b \cdot (a + b)^{*})
== {a,b}*{b}{a,b}*{b}{a,b}*\{a,b\}^{*} \{b\} \{a,b\}^{*} \{b\} \{a,b\}^{*}
== same as above

3.1.5 Linz Example 3.4

Consider the regular expression r=(aa)*(bb)*br = (aa)^{*} (bb)^{*} b.

3.1.6 Linz Example 3.5

For Σ={0,1}\Sigma = \{ 0, 1 \}, give a regular expression rr such that
L(r)={wΣ*:wL(r) = \{w \in \Sigma^{*} : w has at least one pair of consecutive zeros }.

3.1.7 Examples of Regular Expressions for Languages

Show regular expressions on the alphabet {a,b}\{ a, b \} for the following languages.

3.2 Connection Between Regular Expressions and Regular Languages

3.2.1 Regular Expressions Denote Regular Languages

Regular expressions provide a convenient and concise notation for describing regular languages.

Linz Theorem 3.1 (NFAs for Regular Expressions): Let rr be a regular expression. Then there exists some nondeterministic finite accepter (nfa) that accepts L(r)L(r). Consequently, L(r)L(r) is a regular language.

Proof Sketch: Show that any regular expression generated from the inductive definition corresponds to an nfa. Here we proceed informally.

Linz Figure 3.1 diagrammatically demonstrates that there are nfas that correspond to the primitive regular expressions.

  1. nfa accepts \emptyset
  2. nfa accepts {λ}\{ \lambda \}
  3. nfa accepts {a}\{ a \}
Linz Fig. 3.1: Primitive Regular Expressions as NFA

Linz Fig. 3.1: Primitive Regular Expressions as NFA

Linz Figure 3.2 shows a general scheme for a nondeterministic finite accepter (nfa) that accepts L(r)L(r), with an initial state and one final state.

Linz Fig. 3.2: Scheme for NFA Accepting L(r)

Linz Fig. 3.2: Scheme for NFA Accepting L(r)

Linz Figure 3.3 gives an nfa for L(r1+r2)L(r_{1} + r_{2}). Note the use of λ\lambda-transitions to connect the two machines to the new initial and final states.

Linz Fig. 3.3: NFA for Union

Linz Fig. 3.3: NFA for Union

Linz Figure 3.4 shows an nfa for L(r1r2)L(r_{1} r_{2}). Again note the use of λ\lambda-transitions to connect the two machines to the new initial and final states.

Linz Fig. 3.4: NFA for Concatenation

Linz Fig. 3.4: NFA for Concatenation

Linz Figure 3.5 shows an nfa for L(r1*)L(r_{1}^{*}). Note the use of λ\lambda-transitions to represent zero-or-more repetitions of the machine and to connect it to the new initial and final states.

Linz Fig. 3.5: NFA for Star-Closure

Linz Fig. 3.5: NFA for Star-Closure

Thus, Linz Figures 3.3 to 3.5 illustrate composing nfas for any regular expression from the nfas for its subexpressions. Of course, the initial and final states of components are replaced by the initial and final states of the composite nfa.

3.2.2 Linz Example 3.7

Show an nfa that accepts r=(a+bb)*(ba*+λ)r = (a + bb)^{*}(ba^{*} + \lambda).

Linz Figure 3.6, part (a), shows M1M_{1} that accepts L(a+bb)L(a+bb). Part (b) shows M2M_{2} that accepts L(ba*+λ)L(ba^{*} + \lambda).

Linz Fig. 3.6: Toward a Solution to Ex. 3.6

Linz Fig. 3.6: Toward a Solution to Ex. 3.6

Linz Figure 3.7 shows an nfa that accepts L((a+bb)*(ba*+λ)L((a + bb)^{*}(ba^{*} + \lambda).

Linz Fig. 3.7: Solution for Ex. 3.6

Linz Fig. 3.7: Solution for Ex. 3.6

3.2.3 Converting Regular Expressions to Finite Automata

The construction in the proof sketch and example above suggest an algorithm for converting regular expressions to nfas.

This algorithm is adapted from pages 273-4 of the book: James L. Hein, Theory of Computation: An Introduction, Jones and Bartlett, 1996.

The diagrams in this section are from the Hein book, which uses a slightly different notation than the Linz book. In particular, these diagrams use capital letters for the expressions.

Algorithm to convert a regular expression to an nfa

End of Algorithm

Rule 2 in the above algorithm can result in an unbounded number of edges originating at the same state. This makes the algorithm difficult to implement. To remedy this situation, replace Rule 2 as follows.

  1. If an edge is labeled with r+sr + s, then replace the edge with subgraphs for each of rr and ss. The subgraph for rr consists of with a new source state connected to a new destination state with an edge labeled rr. Add edges labeled λ\lambda to connect the original source state to the new source state and the original destination state to the new destination state. Proceed similarly for ss.

3.2.4 Example Conversion of Regular Expression to NFA

This example is from page 275 of the Hein textbook cited above.

Construct an nfa for a*+aba^{*} + a \cdot b.

Start with a the two-state initial diagram.

Next, apply Rule 2 to a*+aba^{*} + a \cdot b.

Next, apply Rule 4 to a*a^{*}.

Finally, apply Rule 3 to aba \cdot b.

3.2.5 Converting Finite Automata to Regular Expressions

The construction in the proof sketch and example above suggest an algorithm for converting finita automata to regular expressions.

This algorithm is adapted from page 276 of the book: James L. Hein, Theory of Computation: An Introduction, Jones and Bartlett, 1996.

Algorithm to convert a finite automaton to a regular expression

Begin with a dfba or an nfa.

  1. Create a new start state ss and connect this to the original start state with an edge labeled λ\lambda.

  2. Create a new final state ff and connect the original final states to this state by edges labeled λ\lambda.

  3. For each pair of states ii and jj that has more than one edge connecting them, replace all the edges with the regular expression formed using union (++) to combine the labels on the previous edges.

  4. Construct a sequence of new machines by eliminating one state at a time until the only states remaining are ss and ff. To eliminate some state kk, construct a new machine as follows.

After eliminating all states except ss and ff, the regular expression is the label on the one edge remaining.

End of Algorithm

3.2.6 Example Conversion of Finite Automata to Regular Expressions

This example is from pages 277-8 of the Hein textbook cited above.

Consider the following dfa.

After applying Rule 1 (new start state), Rule 2 (new final state), and Rule 3 (create union), we get the following machine.

We can eliminate state 2 readily because it is trap state. That is, there is no path through 2 between edges adjacent to 2, so new(i,j)(i,j) == old(i,j)(i,j) for any states i2i \neq 2 and j2j \neq 2. The resulting machine is as follows.

To eliminate state 0, we construct a new edge that is labeled as follows:

Thus, we can eliminate state 0 and its edges and add a new edge (s,1)(s,1) labeled aa.

We can eliminate state 1 by adding a new edge (s,f)(s,f) labeled as follows

Thus the regular expression is a(a+b)*a(a + b)^{*}.

3.2.7 Another Example Conversion of Finite Automa to Regular Expressions

This example is from pages 277-8 of the Hein textbook cited above.

Consider the following dfa. Verify that it corresponds to the regular expression (a+b)*abb(a+b)^{*}abb.