Copyright (C) 2015, H. Conrad Cunningham

Acknowledgements: MS student Clay McLeod assisted in preparation of these notes. These lecture notes are for use with Chapter 3 of the textbook: Peter Linz. Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett Learning, 2012. The terminology and notation used in these notes are similar to those used in the Linz textbook. This document uses several figures from the Linz textbook.

Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of December 2015 seems to be a recent version of Firefox from Mozilla.

Introduction

Regular languages

Thus we will examine other notations for representing regular languages.

3.1 Regular Expressions

3.1.1 Syntax

We define the syntax (or structure) of regular expressions with an inductive definition.

Linz Definition 3.1 (Regular Expression): Let Σ\Sigma be a given alphabet. Then:

  1. \emptyset, λ\lambda, and aΣa \in \Sigma are all regular expressions. These are called primitive regular expressions.

  2. If r1r_{1} and r2r_{2} are regular expressions, then r1+r2r_{1} + r_{2}, r1r2r_{1} \cdot r_{2}, r1*r_{1}^{*}, and (r1)(r_{1}) are also regular expressions.

  3. A string is a regular expression if and only if it can be derived from the primitive regular expressions by a finite number of applications of the rules in (2).

We use the the regular expression operators as follows:

For example, consider regular expression (a+(bc))*(a + (b \cdot c))^{*} over the alphabet {a,b,c}\{a,b,c\}. Note the use of parentheses.

As with arithmetic expressions, precedence rules and conventions can be used to relax the need for parentheses.

A string (a+b+)(a + b +) is not a regular expression. It cannot be generated using the above definition (as augmented by the precedence rules and convention).

3.1.2 Languages Associated with Regular Expressions

But what do we "mean" by a regular expression? That is, what is its semantics.

In particular, what languages do regular expressions describe?

Consider the regular expression (a+(bc))*(a + (b \cdot c))^{*} from above. As implied by the names for the operators, we intend this regular expression to represent the language ({a}{bc})*(\{a\} \cup \{bc\})^{*} which is {λ,a,bc,aa,abc,bca,bcbc,aaa,aabc,bcaa,}\{\lambda, a, bc, aa, abc, bca, bcbc, aaa, aabc,bcaa, \ldots \}.

We again give an inductive definition for the language described by a regular expression. It must consider all the cases given in the definition of regular expression itself.

Linz Definition 3.2: The language L(r)L(r) denoted by any regular expression rr is defined (inductively) by the following rules.

Base cases:

  1. \emptyset is a regular expression denoting the empty set.
  2. λ\lambda is a regular expression denoting {λ}\{ \lambda \}.
  3. For every aΣa \in \Sigma, aa is a regular expression denoting {a}\{ a \}.

Inductive cases: If r1r_{1} and r2r_{2} are regular expressions, then

  1. L(r1+r2)=L(r1)L(r2)L(r_{1} + r_{2}) = L(r_{1}) \cup L(r_{2})
  2. L(r1r2)=L(r1)L(r2)L(r_{1} \cdot r_{2}) = L(r_{1})L(r_{2})
  3. L((r1))=L(r1)L((r_{1})) = L(r_{1})
  4. L(r1*)=(L(r1))*L(r_{1}^{*}) = (L(r_{1}))^{*}

3.1.3 Linz Example 3.2

Show the language L(a*(a+b))L(a^{*} \cdot (a + b)) in set notation.

L(a*(a+b))L(a^{*} \cdot (a + b))
== { Rule 5 }
L(a*)L(a+b)L(a^{*})L(a+b)
== { Rule 7 }
(L(a))*L(a+b)(L(a))^{*}L(a+b)
== { Rule 4 }
(L(a))*(L(a)L(b))(L(a))^{*}(L(a) \cup L(b))
== { definition of star-closure of languages }
{λ,a,aa,aaa,}{a,b}\{\lambda, a, aa, aaa, \ldots \}\{a, b\}
== { definition of concatenation of languages }
{a,aa,aaa,...,b,ab,aab,aaab,}\{a, aa, aaa, ..., b, ab, aab, aaab, \ldots \}

3.1.4 Examples of Languages for Regular Expressions

Consider the languages for the following regular expressions.

L(a*ba*b(a+b)*)L(a^{*} \cdot b \cdot a^{*} \cdot b \cdot (a + b)^{*})
== {a}*{b}{a}*{b}{a,b}*\{a\}^{*} \{b\} \{a\}^{*} \{b\} \{a,b\}^{*}
== {w:w{a,b}*,nb(w)2}\{ w : w \in \{ a, b \}^{*}, n_{b}(w) \geq 2 \}
L((a+b)*ba*ba*)L((a + b)^{*} \cdot b \cdot a^{*} \cdot b \cdot a^{*})
== {a,b}*{b}{a}*{b}{a}*\{a,b\}^{*} \{b\} \{a\}^{*} \{b\} \{a\}^{*}
== same as above
L((a+b)*b(a+b)*b(a+b)*)L((a+b)^{*} \cdot b \cdot (a + b)^{*} \cdot b \cdot (a + b)^{*})
== {a,b}*{b}{a,b}*{b}{a,b}*\{a,b\}^{*} \{b\} \{a,b\}^{*} \{b\} \{a,b\}^{*}
== same as above

3.1.5 Linz Example 3.4

Consider the regular expression r=(aa)*(bb)*br = (aa)^{*} (bb)^{*} b.

3.1.6 Linz Example 3.5

For Σ={0,1}\Sigma = \{ 0, 1 \}, give a regular expression rr such that
L(r)={wΣ*:wL(r) = \{w \in \Sigma^{*} : w has at least one pair of consecutive zeros }.

3.1.7 Examples of Regular Expressions for Languages

Show regular expressions on the alphabet {a,b}\{ a, b \} for the following languages.

3.2 Connection Between Regular Expressions and Regular Languages

3.2.1 Regular Expressions Denote Regular Languages

Regular expressions provide a convenient and concise notation for describing regular languages.

Linz Theorem 3.1 (NFAs for Regular Expressions): Let rr be a regular expression. Then there exists some nondeterministic finite accepter (nfa) that accepts L(r)L(r). Consequently, L(r)L(r) is a regular language.

Proof Sketch: Show that any regular expression generated from the inductive definition corresponds to an nfa. Here we proceed informally.

Linz Figure 3.1 diagrammatically demonstrates that there are nfas that correspond to the primitive regular expressions.

  1. nfa accepts \emptyset
  2. nfa accepts {λ}\{ \lambda \}
  3. nfa accepts {a}\{ a \}
Linz Fig. 3.1: Primitive Regular Expressions as NFA

Linz Fig. 3.1: Primitive Regular Expressions as NFA

Linz Figure 3.2 shows a general scheme for a nondeterministic finite accepter (nfa) that accepts L(r)L(r), with an initial state and one final state.

Linz Fig. 3.2: Scheme for NFA Accepting L(r)

Linz Fig. 3.2: Scheme for NFA Accepting L(r)

Linz Figure 3.3 gives an nfa for L(r1+r2)L(r_{1} + r_{2}). Note the use of λ\lambda-transitions to connect the two machines to the new initial and final states.

Linz Fig. 3.3: NFA for Union

Linz Fig. 3.3: NFA for Union

Linz Figure 3.4 shows an nfa for L(r1r2)L(r_{1} r_{2}). Again note the use of λ\lambda-transitions to connect the two machines to the new initial and final states.

Linz Fig. 3.4: NFA for Concatenation

Linz Fig. 3.4: NFA for Concatenation

Linz Figure 3.5 shows an nfa for L(r1*)L(r_{1}^{*}). Note the use of λ\lambda-transitions to represent zero-or-more repetitions of the machine and to connect it to the new initial and final states.

Linz Fig. 3.5: NFA for Star-Closure

Linz Fig. 3.5: NFA for Star-Closure

Thus, Linz Figures 3.3 to 3.5 illustrate composing nfas for any regular expression from the nfas for its subexpressions. Of course, the initial and final states of components are replaced by the initial and final states of the composite nfa.

3.2.2 Linz Example 3.7

Show an nfa that accepts r=(a+bb)*(ba*+λ)r = (a + bb)^{*}(ba^{*} + \lambda).

Linz Figure 3.6, part (a), shows M1M_{1} that accepts L(a+bb)L(a+bb). Part (b) shows M2M_{2} that accepts L(ba*+λ)L(ba^{*} + \lambda).

Linz Fig. 3.6: Toward a Solution to Ex. 3.6

Linz Fig. 3.6: Toward a Solution to Ex. 3.6

Linz Figure 3.7 shows an nfa that accepts L((a+bb)*(ba*+λ)L((a + bb)^{*}(ba^{*} + \lambda).

Linz Fig. 3.7: Solution for Ex. 3.6

Linz Fig. 3.7: Solution for Ex. 3.6

3.2.3 Converting Regular Expressions to Finite Automata

The construction in the proof sketch and example above suggest an algorithm for converting regular expressions to nfas.

This algorithm is adapted from pages 273-4 of the book: James L. Hein, Theory of Computation: An Introduction, Jones and Bartlett, 1996.

The diagrams in this section are from the Hein book, which uses a slightly different notation than the Linz book. In particular, these diagrams use capital letters for the expressions.

Algorithm to convert a regular expression to an nfa

End of Algorithm

Rule 2 in the above algorithm can result in an unbounded number of edges originating at the same state. This makes the algorithm difficult to implement. To remedy this situation, replace Rule 2 as follows.

  1. If an edge is labeled with r+sr + s, then replace the edge with subgraphs for each of rr and ss. The subgraph for rr consists of with a new source state connected to a new destination state with an edge labeled rr. Add edges labeled λ\lambda to connect the original source state to the new source state and the original destination state to the new destination state. Proceed similarly for ss.

3.2.4 Example Conversion of Regular Expression to NFA

This example is from page 275 of the Hein textbook cited above.

Construct an nfa for a*+aba^{*} + a \cdot b.

Start with a the two-state initial diagram.

Next, apply Rule 2 to a*+aba^{*} + a \cdot b.

Next, apply Rule 4 to a*a^{*}.

Finally, apply Rule 3 to aba \cdot b.

3.2.5 Converting Finite Automata to Regular Expressions

The construction in the proof sketch and example above suggest an algorithm for converting finita automata to regular expressions.

This algorithm is adapted from page 276 of the book: James L. Hein, Theory of Computation: An Introduction, Jones and Bartlett, 1996.

Algorithm to convert a finite automaton to a regular expression

Begin with a dfba or an nfa.

  1. Create a new start state ss and connect this to the original start state with an edge labeled λ\lambda.

  2. Create a new final state ff and connect the original final states to this state by edges labeled λ\lambda.

  3. For each pair of states ii and jj that has more than one edge connecting them, replace all the edges with the regular expression formed using union (++) to combine the labels on the previous edges.

  4. Construct a sequence of new machines by eliminating one state at a time until the only states remaining are ss and ff. To eliminate some state kk, construct a new machine as follows.

After eliminating all states except ss and ff, the regular expression is the label on the one edge remaining.

End of Algorithm

3.2.6 Example Conversion of Finite Automata to Regular Expressions

This example is from pages 277-8 of the Hein textbook cited above.

Consider the following dfa.

After applying Rule 1 (new start state), Rule 2 (new final state), and Rule 3 (create union), we get the following machine.

We can eliminate state 2 readily because it is trap state. That is, there is no path through 2 between edges adjacent to 2, so new(i,j)(i,j) == old(i,j)(i,j) for any states i2i \neq 2 and j2j \neq 2. The resulting machine is as follows.

To eliminate state 0, we construct a new edge that is labeled as follows:

Thus, we can eliminate state 0 and its edges and add a new edge (s,1)(s,1) labeled aa.

We can eliminate state 1 by adding a new edge (s,f)(s,f) labeled as follows

Thus the regular expression is a(a+b)*a(a + b)^{*}.

3.2.7 Another Example Conversion of Finite Automa to Regular Expressions

This example is from pages 277-8 of the Hein textbook cited above.

Consider the following dfa. Verify that it corresponds to the regular expression (a+b)*abb(a+b)^{*}abb.

Applying Rules 1 and 2 (adding new start and final states), we get the following machine.

To eliminate state 0, we add the following new edges.

We can eliminate either state 2 or state 3 next. Let's choose 3. Thus we create the following new edges.

Now we eliminate state 2 and thus create the new edges.

Finally, we remove state 1 by creating a new edge.

3.2.8 Regular Expressions for Describing Simple Patterns

Pascal integer constants

Regular expression sdd*sdd^{*} where

Pattern matching

Program for Pattern Matching

We can convert a regular expression to an equivalent nfa, the nfa to a dfa, and the dfa to a transition table. We can use the transition table to drive a program for pattern matching.

For a more effiicent program, we can apply the state reduction algorithm to the dfa before converting to a transition table. Linz section 2.4, which we did not cover this semester, discusses this algorithm.

3.3 Regular Grammars

We have studied two ways of describing regular languages--finite automata (i.e. dfas, nfas) and regular expressions. Here we examine a third way--regular grammars.

Linz Definition 3.3 (Right-Linear Grammar): A grammar G=(V,T,S,P)G = (V,T,S,P) is said to be right-linear if all productions are of one of the forms

AxBA \rightarrow xB
AxA \rightarrow x

where A,BVA, B \in V and xT*x \in T^{*}.

Similarly, a grammar is said to be left-linear if all productions are of the form ABxA \rightarrow Bx or AxA \rightarrow x.

A regular grammar is one that is either right-linear or left-linear.

3.3.1 Linz Example 3.13

The grammar G1=({S},{a,b},S,P1)G_{1} = (\{ S \}, \{ a, b \}, S, P_{1}), with P1P_{1} given as

is right-linear.

The grammar G2=({S,S1,S2},{a,b},S,P2)G_{2} = ( \{ S, S_{1}, S_{2} \}, \{ a, b \}, S, P_{2} ), with productions

is left linear. Both G1G_{1} and G2G_{2} are regular grammars.

L(G1)=L((ab)*a)L(G_{1}) = L((ab)^{*}a)

L(G2)=L(aab(ab)*)L(G_{2}) = L(aab(ab)^{*})

3.3.2 Linz Example 3.14

The grammar G=({S,A,B},{a,b},S,P)G = (\{ S, A, B \}, \{a, b \}, S, P) with productions

is not regular.

Although every production is either in right-linear or left-linear form, the grammar itself is neither right-linear nor left-linear, and therefore is not regular. The grammar is an example of a linear grammar.

Definition (Linear Grammar): A linear grammar is a grammar in which at most one variable can appear on the right side of any production.

3.3.3 Right-Linear Grammars Generate Regular Languages

Linz Theorem 3.3 (Regular Languages for Right-Linear Grammars): Let G=(V,T,S,P)G = (V, T, S, P) be a right-linear grammar. Then L(G)L(G) is a regular language.

Strategy: Because a regular language is any language accepted by a dfa or nfa, we seek to construct an nfa that simulates the derivations of the right-linear grammar.

The algorithm below incorporates this idea. It is based on the algorithm given on page 314 of the book: James L. Hein, Theory of Computation: An Introduction, Jones and Bartlett, 1996.

Algorithm to convert a regular grammar to an nfa

Start with a right-linear grammar and construct an equivalent nfa. We label the nfa's states primarily with variables from the grammar and label edges with terminals in the grammar or λ\lambda.

  1. If necessary, transform the grammar so that all productions have the form AxA \rightarrow x or AxBA \rightarrow xB, where xx is either a terminal in the grammar or λ\lambda.

  2. Label the start state of the nfa with the start symbol of the grammar.

  3. For each production IaJI \rightarrow aJ, add a state transition (edge) from a state II to a state JJ with the edge labeled with the symbol aa.

  4. For each production IJI \rightarrow J, add a state transition (edge) from a state II to a state JJ with the edge labeled with λ\lambda.

  5. If there exist productions of the form IaI \rightarrow a, then add a single new state symbol FF. For each production of the form IaI \rightarrow a, add a state transition from II to FF labeled with symbol aa.

  6. The final states of the nfa are FF plus all II such there is a production of the form IλI \rightarrow \lambda.

End of algorithm

3.3.4 Example: Converting Regular Grammar to NFA

Construct an nfa for the following regular grammar GG:

The grammar is in the correct form, so step 1 of the grammar is not applicable. The following sequence of diagrams shows the use of steps 2, 3 (three times), 5, and 6 of the algorithm. Step 4 is not applicable to this grammar.

Note that L(G)=L(a*ba*a)L(G) = L(a^{*}ba^{*}a).

3.3.5 Linz Example 3.5

This is similar to the example in the Linz textbook, but we apply the algorithm as stated above.

Construct an nfa for the regular grammar GG:

First, let's transform the grammar according to step 1 of the regular grammar to nfa algorithm above.

Applying steps 2, 3 (three times), 5, and 6 of algorithm as show below, we construct the following nfa. Step 4 was not applicable in this problem.

Note that L(G)=L((aab)*ab)L(G) = L((aab)^{*}ab).

3.3.6 Right-Linear Grammars for Regular Languages

Linz Theorem 3.4 (Right-Linear Grammars for Regular Languages): If LL is a regular language on the alphabet Σ\Sigma, then there exists a right-linear grammar G=(V,Σ,S,P)G = (V, \Sigma, S, P) such that L=L(G)L = L(G).

Strategy: Reverse the construction of an nfa from a regular grammar given above.

The algorithm below incorporates this idea. It is based on the algorithm given on page 312 of the Hein textbook cited above.

Algorithm to convert an nfa to a regular grammar

Start with an nfa and construct a regular grammar.

  1. Relabel the states of the nfa with capital letters.

  2. Make the start state label the start symbol for the grammar.

  3. For each transition (edge) from a state II to a state JJ labeled with an alphabetic symbol aa, add a production IaJI \rightarrow aJ to the grammar.

  4. For each transition (edge) from a state II to a state JJ labeled with λ\lambda, add a production IJI \rightarrow J to the grammar.

  5. For each final state labeled KK, add a production KλK \rightarrow \lambda to the grammar.

End of algorithm

3.3.7 Example: Converting NFA to Regular Grammar

Consider the following nfa (adapted from the Hein textbook page 313). (The Hein book uses Λ\Lambda instead of λ\lambda to label silent moves and empty strings.)

We apply the steps of the algorithm as follows.

  1. The nfa states are already labeled as specified.

  2. Choose SS as start symbol for grammar.

  3. Add the following productions:

  4. Add the following production:

  5. Add the following production:

So, combining the above productions, we get the final grammar:

3.3.8 Equivalence Between Regular Languages and Regular Grammars

Linz Theorem 3.5 (Equivalence of Regular Languages and Left-Linear Grammars): A language LL is regular if and only if there exists a left-linear grammar GG such that L=L(G)L=L(G).

Linz Theorem 3.6(Equivalence of Regular Languages and Right-Linear Grammars): A language LL is regular if and only if there exists a regular grammar GG such that L=L(G)L = L(G).

The four theorems from this section enable us to convert back and forth among finite automata and regular languages as shown in Linz Figure 3.19. Remember that Linz Theorem 2.2 enabled us to translate from nfa to dfa.

Linz Fig. 3.19: Equivalence of Regular Languages and Regular Grammars

Linz Fig. 3.19: Equivalence of Regular Languages and Regular Grammars