Notes on Models of Computation
Chapter 4

H. Conrad Cunningham

06 April 2022

Copyright (C) 2015, 2022, H. Conrad Cunningham
Professor of Computer and Information Science
University of Mississippi
214 Weir Hall
P.O. Box 1848
University, MS 38677
(662) 915-7396 (dept. office)

Browser Advisory: The HTML version of this textbook requires a browser that supports the display of MathML. A good choice as of April 2022 is a recent version of Firefox from Mozilla.

Note: These notes were written primarily to accompany my use of Chapter 1 of the Linz textbook An Introduction to Formal Languages and Automata [[1].

4 Properties of Regular Languages

The questions answered in this chapter include:

The concepts introduced in this chapter are:

4.1 Closure Properties of Regular Languages

4.1.1 Mathematical Interlude: Operations and Closure

Definition (Operation): An operation is a function p:VYp : V \rightarrow Y where VX1×X2××XkV \in X_{1} \times X_{2} \times \cdots \times X_{k} for some sets XiX_{i} with 0ik0 \leq i \leq k. kk is the number of operands (or arguments) of the operation.

  • If k=0k = 0, then pp is a nullary operation.
  • If k=1k = 1, then pp is a unary operation.
  • If k=2k = 2, then pp is a binary operation.
  • etc.

We often use special notation and conventions for unary and binary operations. For example:

  • a binary operation may be written in an infix style as in x+yx + y and xyx \cdot y

  • a unary operation may be written in a prefix style as in x-x, suffix style such as x*x^{*}, or special style such as 3\sqrt{3} or S\bar{S}

  • a binary operation may be implied by the juxtaposition such as 3x3x for multiplication or (in a different context) xyxy for string concatenation or implied by superscripting such as x2x^{2} for exponentiation

Often we consider an operations on a set, where all the operands and the result are drawn from the same set.

Definition (Closure): A set SS is closed under a unary operation pp if, for all xSx \in S, p(x)Sp(x) \in S. Similarly, a set SS is closed under a binary operation \odot if, for all xSx \in S and ySy \in S, xySx \odot y \in S.

Examples arithmetic on the set of natural numbers (={0,1,...}\mathbb{N} = \{0, 1, ...\})

  • Binary operations addition (++) and multiplication (** in programming languages) are closed on \mathbb{N}

    • x,y,x+y\forall x, y \in \mathbb{N}, x + y \in \mathbb{N}
    • x,y,x*y\forall x, y \in \mathbb{N}, x * y \in \mathbb{N}
  • Binary operations subtraction (-) and division (//) are not closed on \mathbb{N}

    • x,y,xy\exists x, y \in \mathbb{N}, x - y \notin \mathbb{N}
      For example, 121 - 2 is not a natural number.

    • x,y,x/y\exists x, y \in \mathbb{N}, x / y \notin \mathbb{N}
      For example, 3/23 / 2 is not a natural number.

  • Unary operation negation (operator - written in prefix form) is not closed on \mathbb{N}.

However, the set of integers is closed under subtraction and negation. But it is not closed under division or square root (as we normally define the operations).

Now, let’s consider closure of the set of regular languages with respect to the simple set operations.

4.1.2 Closure under Simple Set Operations

Linz Theorem 4.1 (Closure under Simple Set Operations): If L1L_{1} and L2L_{2} are regular languages, then so are L1L2L_{1} \cup L_{2}, L1L2L_{1} \cap L_{2}, L1L2L_{1}L_{2}, L1\bar{L_{1}}, and L1*L_{1}^{*}.

That is, we say that the family of regular languages is closed under union, intersection, concatenation, complementation, and star-closure.

Proof of L1L2L_{1} \cup L_{2}

Let L1L_{1} and L2L_{2} be regular languages.

L1L2L_{1} \cup L_{2}
== { Th. 3.2: there exist regular expressions r1r_{1}, r2r_{2} }
L(r1)L(r2)L(r_{1}) \cup L(r_{2})
== { Def. 3.2, rule 4 }
L(r1+r2)L(r_{1} + r_{2})

Thus, by Theorem 3.1 (regular expressions describe regular languages), the union is a regular language.

Thus L1L2L_{1} \cup L_{2} is a regular language. QED.

Proofs of L1L2L_{1}L_{2} and L1*L_{1}^{*}

Similar to the proof of L1L2L_{1} \cup L_{2}.

Proof of L1\bar{L_{1}}

Strategy: Given a dfa MM for the regular language, construct a new dfa M̂\widehat{M} that accepts everything rejected and rejects everything accepted by the given dfa.

L1L_{1} is a regular language on Σ\Sigma.
\equiv { Def. 2.3 }
\exists dfa M=(Q,Σ,δ,q0,F)M = (Q,\Sigma,\delta,q_{0},F) such that L(M)=L1L(M)=L_{1}.

Thus

ωΣ*\omega \in \Sigma^{*}
\Rightarrow { by the properties of dfas and sets }
Either δ*(q0,ω)F\delta^{*}(q_{0},\omega)\in F or δ*(q0,ω)QF\delta^{*}(q_{0},\omega)\in Q-F
\Rightarrow { Def. 2.2: language accepted by dfa }
Either ωL(M)\omega \in L(M) or ωL(M̂)\omega \in L(\widehat{M}) for some dfa M̂\widehat{M}

Let’s construct dfa M̂=(Q,Σ,δ,q0,QF)\widehat{M} = (Q, \Sigma, \delta, q_{0}, Q-F).

Clearly, L(M̂)=L1L(\widehat{M}) = \bar{L_{1}}. Thus L1\bar{L_{1}} is a regular language. QED.

Proof of L1L2L_{1} \cap L_{2}

Strategy: Given two dfas for the two regular languages, construct a new dfa that accepts a string if and only if both original dfas accept the string.

Let L1=L(M1)L_{1} = L(M_{1}) and L2=L(M2)L_{2} = L(M_{2}) for dfas:

M1=(Q,Σ,δ1,q0,F1)M_{1} = (Q, \Sigma, \delta_{1}, q_{0}, F_{1})

M2=(P,Σ,δ2,p0,F2)M_{2} = (P, \Sigma, \delta_{2}, p_{0}, F_{2})

Construct M̂=(Q̂,Σ,δ̂,(q0,p0),F̂)\widehat{M} = (\widehat{Q}, \Sigma, \widehat{\delta}, (q_{0}, p_{0}), \widehat{F}), where

Q̂=Q×P\widehat{Q} = Q \times P

δ̂((qi,pj),a)=(qk,pl)\widehat{\delta}((q_{i}, p_{j}), a) = (q_{k}, p_{l}) when

δ1(qi,a)=qk\delta_{1}(q_{i}, a) = q_{k}

δ2(pj,a)=pl\delta_{2}(p_{j}, a) = p_{l}

F̂={(q,p):qF1,pF2}\widehat{F} = \{ (q,p) : q \in F_{1}, p \in F_{2} \}

Clearly, ωL1L2\omega \in L_{1} \cap L_{2} if and only if ω\omega accepted by M̂\widehat M.

Thus, L1L2L_{1} \cap L_{2} is regular. QED.

The previous proof is constructive.

  • It establishes desired result.

  • It provides an algorithm for building an item of interest (e.g., dfa to accept L1L2L_{1} \cap L_{2}).

Sometimes nonconstructive proofs are shorter and easier to understand. But they provide no algorithm.

Alternate (nonconstructive) proof for L1L2L_{1} \cap L_{2}

L1L_{1} and L2L_{2} are regular.
\equiv { previously proved part of Theorem 4.1 }
L1\bar{L_{1}} and L2\bar{L_{2}} are regular.
\Rightarrow { previously proved part of Theorem 4.1 }
L1L\bar{L_{1}} \cup \bar{L_{}} is regular
\Rightarrow { previously proved part of Theorem 4.1 }
L1L2¯\overline{\bar{L_{1}} \cup \bar{L_{2}}} is regular
\equiv { deMorgan’s Law for sets }
L1L2L_{1} \cap L_{2} is regular

QED.

4.1.3 Closure under Difference (Linz Example 4.1)

Consider the difference between two regular languages L1L_{1} and L2L_{2}, written L1L2L_{1} - L_{2}.

But this is just set difference, which is defined L1L2=L1L2L_{1} - L_{2} = L_{1} \cap \bar{L_{2}}.

From Theorem 4.1 above, we know that regular languages are closed under both complementation and intersection. Thus, regular languages are closed under difference as well.

4.1.4 Closure under Reversal

Linz Theorem 4.2 (Closure under Reversal): The family of regular languages is closed under reversal.

Proof (constructive)

Strategy: Construct an nfa for the regular language and then reverse all the edges and exchange roles of the initial and final states.

Let L1L_{1} be a regular language. Construct an nfa MM such that L1=L(M)L_{1} = L(M) and MM has a single final state. (We can add λ\lambda transitions from the previous final states to create a single new final state.)

Now construct a new nfa M̂\hat{M} as follows.

  • Make the initial state of MM the final state of M̂\hat{M}.
  • Make the final state of MM the initial state of M̂\hat{M}.
  • Reverse the direction of all edges of MM keeping the same labels and add the edges to M̂\hat{M}.

Thus nfa M̂\hat{M} accepts ωRΣ*\omega^{R} \in \Sigma^{*} if and only if the original nfa accepts ωΣ*\omega \in \Sigma^{*}. QED.

4.1.5 Homomorphism Definition

In mathematics, a homomorphism is a mapping between two mathematical structures that preserves the essential structure.

Linz Definition 4.1 (Homomorphism): Suppose Σ\Sigma and Γ\Gamma are alphabets. A function

h:ΣΓ*h: \Sigma \rightarrow \Gamma^{*}

is called a homomorphism.

In words, a homomorphism is a substitution in which a single letter is replaced with a string.

We can extend the domain of a function hh to strings in an obvious fashion. If

w=a1a2anw = a_{1}a_{2} \ \cdots\ a_{n} for n0n \geq 0

then

h(w)=h(a1)h(a2)h(an)h(w) = h(a_{1})h(a_{2}) \ \cdots\ h(a_{n}).

If LL is a language on Σ\Sigma, then we define its homomorphic image as

h(L)={h(w):wL}h(L) = \{ h(w) : w \in L \}.

Note: The homomorphism function hh preserves the essential structure of the language. In particular, it preserves operation concatenation on strings, i.e., h(λ)=λh(\lambda) = \lambda and h(uv)=h(u)h(v)h(uv) = h(u)h(v).

4.1.6 Linz Example 4.2

Let Σ={a,b}\Sigma = \{ a, b \} and Γ={a,b,c}\Gamma = \{ a, b, c \}.

Define hh as follows:

h(a)=abh(a) = ab,

h(b)=bbch(b) = bbc

Then h(aba)=abbbcabh(aba) = abbbcab.

The homomorphic image of L={aa,aba}L = \{ aa, aba \} is the language h(L)={abab,abbbcab}h(L) = \{ abab, abbbcab \}.

If we have a regular expression rr for a language LL, then a regular expression for h(L)h(L) can be obtained by simply applying the homomorphism to each Σ\Sigma symbol of rr. We show this in the next example.

4.1.7 Linz Example 4.3

For Σ={a,b}\Sigma = \{ a,b \} and Γ={b,c,d}\Gamma = \{ b, c, d \}, define hh:

h(a)=dbcch(a) = dbcc

h(b)=bdch(b) = bdc

If LL is a regular language denoted by the regular expression

r=(a+b*)(aa)*r = (a + b^{*})(aa)^{*}

then

r1=(dbcc+(bdc)*)(dbccdbcc)*r_{1} = (dbcc + (bdc)^{*})(dbccdbcc)^{*}

denotes the regular language h(L)h(L).

The general result on the closure of regular languages under any homomorphism follows from this example in an obvious manner.

4.1.8 Closure under Homomorphism Theorem

Linz Theorem 4.3 (Closure under Homomorphism): Let hh be a homomorphism. If LL is a regular language, then its homomorphic image h(L)h(L) is also regular.

Proof: Similar to the argument in Example 4.3. See Linz textbook for full proof.

The family of regular languages is therefore closed under arbitrary homomorphisms.

4.1.9 Right Quotient Definition

Linz Definition 4.2 (Right Quotient): Let L1L_{1} and L2L_{2} be languages on the same alphabet. Then the right quotient of L1L_{1} with L2L_{2} is defined as

L1/L2={x:xyL1L_{1} / L_{2} = \{ x : xy \in L_{1} for some yL2}y \in L_{2} \}

4.1.10 Linz Example 4.4

Given languages L1L_{1} and L2L_{2} such that

L1={anbm:n1,m0}{ba}L_{1} = \{ a^{n}b^{m} : n \geq 1, m \geq 0 \} \cup \{ ba \}

L2={bm:m1}L_{2} = \{ b^{m} : m \geq 1 \}

Then

L1/L2={anbm:n1,m0}L_{1} / L_{2} = \{ a^{n}b^{m} : n \geq 1, m \geq 0 \}.

The strings in L2L_{2} consist of one or more bb’s. Therefore, we arrive at the answer by removing one or more bb’s from those strings in L1L_{1} that terminate with at least one bb as a suffix.

Note that in this example L1L_{1}, L2L_{2}, and L1/L2L_{1} / L_{2} are regular.

Can we construct a dfa for L1/L2L_{1} / L_{2} from dfas for L1L_{1} and L2L_{2}?

Linz Figure 4.1 shows a dfa M1M_{1} that accepts L1L_{1}.

Linz Fig. 4.1: DFA for Example 4.4 L_{1}

An automaton for L1/L2L_{1} / L_{2} must accept any xx such that xyL1xy \in L_{1} and yL2y \in L_{2}.

For all states qM1q \in M_{1}, if there exists a walk labeled vv from qq to a final state qfq_{f} such that vL2v \in L_{2}, then make qq a final state of the automaton for L1/L2L_{1} / L_{2}.

In this example, we check states to see if there is bb*bb^{*} walk to any of the final states q1q_{1}, q2q_{2}, or q4q_{4}.

  • q1q_{1} and q2q_{2} have such walks.

  • q0q_{0}, q3q_{3}, and q4q_{4} do not.

The resulting automaton is shown in Linz Figure 4.2.

Linz Fig. 4.2: DFA for Example 4.4 L_{1} / L_{2} EXCEPT q_{4} NOT FINAL

The next theorem generalizes this construction.

4.1.11 Closure under Right Quotient

Linz Theorem 4.4 (Closure under Right Quotient): If L1L_{1} and L2L_{2} are regular languages, then L1/L2L_{1} / L_{2} is also regular. We say that the family of regular languages is closed under right quotient with a regular language.

Proof

Let dfa M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_{0}, F) such that L(M)=L1L(M) = L_{1}.

Construct dfa M̂=(Q,Σ,δ,q0,F̂)\widehat{M} = (Q, \Sigma, \delta, q_{0}, \widehat{F}) for L1/L2L_{1}/L_{2} as follows.

For all qiQq_{i} \in Q, let dfa Mi=(Q,Σ,δ,qi,F)M_{i} = (Q, \Sigma, \delta, q_{i}, F). That is, dfa MiM_{i} is the same as MM except that it starts at qiq_{i}.

  • From Theorem 4.1, we know L(Mi)L2L(M_{i}) \cap L_{2} is regular. Thus we can construct the intersection machine as show in the proof of Theorem 4.1.

  • If there is any path in the intersection machine from its initial state to a final state, then L(Mi)L2L(M_{i}) \cap L_{2} \neq \emptyset. Thus qiF̂q_{i} \in \widehat{F} in machine M̂\widehat{M}.

Does L(M̂)=L1/L2L(\widehat{M}) = L_{1} / L_{2}?

First, let xL1/L2x \in L_{1} / L_{2}.

  • By definition, there must be yL2y \in L_{2} such that xyL1xy \in L_{1}.

  • Thus δ*(q0,xy)F\delta^{*}(q_{0},xy) \in F.

  • There must be some qq such that δ*(q0,x)=q\delta^{*}(q_{0},x) = q and δ*(q,y)F\delta^{*}(q,y) \in F.

  • Thus, by construction, qF̂q \in \widehat{F}. Hence, M̂\widehat{M} accepts xx.

Now, let xx be accepted by M̂\widehat{M}.

  • δ*(q0,x)=qF̂\delta^{*}(q_{0},x) = q \in \widehat{F}.

  • Thus, by construction, we know there is a yL2y \in L_{2} such that δ*(q,y)F\delta^{*}(q,y) \in F.

Thus L(M̂)=L1/L2L(\widehat{M}) = L_{1} / L_{2}, which means L1/L2L_{1} / L_{2} is regular.

4.1.12 Linz Example 4.5

Find L1/L2L_{1} / L_{2} for

L1=L(a*baa*)L_{1} = L(a^{*}baa^{*})

L2=L(ab*)L_{2} = L(ab^{*})

We apply the construction (algorithm) used in the proof of Theorem 4.4.

Linz Figure 4.3 shows a dfa for L1L_{1}.

Linz Fig. 4.3: DFA for Example 4.5 L_{1}

Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_{0}, F).

Thus if we construct the sequence of machines MiM_{i}

L(M0)L2=L(M_{0}) \cap L_{2} = \emptyset

L(M1)L2={a}L(M_{1}) \cap L_{2} = \{a\} \neq \emptyset

L(M2)L2={a}L(M_{2}) \cap L_{2} = \{a\} \neq \emptyset

L(M3)L2=L(M_{3}) \cap L_{2} = \emptyset

then the resulting dfa for L1/L2L_{1} / L_{2} is shown in Linz Figure 4.4.

Linz Fig. 4.4: DFA for Example 4.5 L_{1} / L_{2}

The automaton shown in Figure 4.4 accepts the language denoted by the regular expression

a*b+a*baa*a^{*}b + a^{*}baa^{*}

which can be simplified to

a*ba*a^{*}ba^{*}

4.2 Elementary Questions about Regular Languages

4.2.1 Membership?

Fundamental question: Is wLw \in L?

It is difficult to find a membership algorithm for languages in general. But it is relatively easy to do for regular languages.

A regular language is given in a standard representation if and only if described with one of:

  • a dfa or nfa
  • a regular expression
  • a regular grammar

Linz Theorem 4.5 (Membership): Given a standard representation of any regular language LL on Σ\Sigma and any wΣ*,w \in \Sigma^{*}, there exists an algorithm for determining whether or not ww is in LL.

Proof

We represent the language by some dfa, then test ww to see if it is accepted by this automaton. QED.

4.2.2 Finite or Infinite?

Linz Theorem 4.6 (Finiteness): There exists an algorithm for determining whether a regular language, given in standard representation, is empty, finite, or infinite.

Proof

Represent LL as a transition graph of a dfa.

  • If simple path exists from the initial state to any final state, then it is not empty. Otherwise, it is empty.

  • If any vertex on a cycle is in a path from the initial state to any final state, then the language is infinite. Otherwise, it is finite.

QED.

4.2.3 Equality?

Consider the question L1=L2L_{1} = L_{2}?

This is practically important. But it is a difficult issue because there are many ways to represent L1L_{1} and L2L_{2}.

Linz Theorem 4.7 (Equality): Given a standard representation of two regular languages L1L_{1} and L2L_{2}, there exists an algorithm to determine whether or whether not L1=L2L_{1} = L_{2}.

Proof

Let L3=(L1L2)(L1L2)L_{3} = (L_{1} \cap \bar{L_{2}}) \cup (\bar{L_{1}} \cap L_{2}).

By closure, L3L_{3} is regular. Hence, there is a dfa MM that accepts L3L_{3}.

Because of Theorem 4.6, we can determine whether L3L_{3} is empty or not.

But from Excerise 8, Section 1.1, we see that L3=L_{3} = \emptyset if and only if L1=L2L_{1} = L_{2}. QED.

4.3 Identifying Nonregular Languages

A regular languages may be infinite

In processing a string, the amount of information that the automaton must “remember” is strictly limited (finite and bounded).

4.3.1 Using the Pigeonhole Principle

In mathematics, the pigeonhole principle refers to the following simple observation:

If we put nn objects into mm boxes (pigeonholes), and, if n>mn > m, at least one box must hold more than one item.

This is obvious, but it has deep implications.

4.3.2 Linz Example 4.6

Is the language L={anbn:n0}L = \{ a^{n}b^{n} : n \geq 0 \} regular?

The answer is no, as we show below.

Proof that LL is not regular

Strategy: Use proof by contradiction. Assume that what we want to prove is false. Show that this introduces a contradiction. Hence, the original assumption must be true.

Assume LL is regular.

Thus there exists a dfa M=(Q,{a,b},δ,q0,F)M = (Q,\{ a,b \},\delta,q_{0},F) such that L(M)=LL(M) = L.

Machine MM has a specific number of states. However, the number of aa’s in a string in L(M)L(M) is finite but unbounded (i.e., no maximum value for the length). If nn is larger than the number of states in MM, then, according to the pigeonhole principle, there must be some state qq such that

δ*(q0,an)=q\delta^{*}(q_{0},a^{n}) = q

and

δ*(q0,am)=q\delta^{*}(q_{0}, a^{m}) = q

with nmn \neq m. But, because MM accepts anbna^{n}b^{n},

δ*(q,bn)=qfF\delta^{*}(q,b^{n}) = q_{f} \in F

for some qfFq_{f} \in F.

From this we reason as follows:

δ*(q0,ambn)\delta^{*}(q_{0}, a^{m}b^{n})
== δ*(δ*(q0,am),bn)\delta^{*}(\delta^{*}(q_{0}, a^{m}), b^{n})
== δ*(q,bn)\delta^{*}(q, b^{n})
== qfq_{f}

But this contradicts the assumption that MM accepts ambna^{m}b^{n} only if n=mn = m. Therefore, LL cannot be regular. QED

We can use the pigeonhole principle to make “finite memory” precise.

4.3.3 Pumping Lemma for Regular Languages

Linz Theorem 4.8 (Pumping Lemma for Regular Languages): Let LL be an infinite regular language. There exists some m>0m > 0 such that any wLw \in L with |w|m|w| \geq m can be decomposed as

w=xyzw = xyz

with

|xy|m|xy| \leq m

and

|y|1|y| \geq 1

such that

wi=xyizw_{i} = xy^{i}z

is also in LL for all i0i \geq 0.

That is, we can break every sufficiently long string from LL into three parts in such a way that an arbitrary number of repetitions of the middle part yields another string in LL.

We can “pump” the middle string, which gives us the name pumping lemma for this theorem.

Proof

Let LL be an infinite regular language. Thus there exists a dfa MM that accepts LL. Let MM have states q0,q1,q2,qnq_{0}, q_{1}, q_{2}, \cdots\ q_{n}.

Consider a string wLw \in L such that |w|m=n+1|w| \geq m = n + 1. Such a string exists because LL is infinite.

Consider the set of states q0,qi,qj,qfq_{0}, q_{i}, q_{j}, \cdots\ q_{f} that MM traverses as it processes ww.

The size of this set is exactly |w|+1|w| + 1. Thus, according to the pigeonhole principle, at least one state must be repeated, and such a repetition must start no later than the nnth move.

Thus the sequence is of the form

q0,qi,qj,,qr,,qr,,qfq_{0}, q_{i}, q_{j}, \cdots, q_{r}, \cdots, q_{r}, \cdots, q_{f}.

Then there are substrings xx, yy, and zz of ww such that

δ*(q0,x)=qr\delta^{*}(q_{0}, x) = q_{r}

δ*(qr,y)=qr\delta^{*}(q_{r}, y) = q_{r}

δ*(qr,z)=qf\delta^{*}(q_{r}, z) = q_{f}

with |xy|n+1=m|xy| \leq n + 1 = m and |y|1|y| \geq 1. Thus, for any k0k \geq 0,

δ*(q0,xykz)=qf\delta^{*}(q_{0}, xy^{k}z) = q_{f}

QED.

We can use the pumping lemma to show that languages are not regular. Each of these is a proof by contradiction.

4.3.4 Linz Example 4.7

Show that L={anbn:n0}L = \{ a^{n}b^{n} : n \geq 0 \} is not regular.

Assume that LL is regular, so that the Pumping Lemma must hold.

If, for some n0n \geq 0 and i0i \geq 0, xyz=anbnxyz = a^{n}b^{n} and xyizxy^{i}z are both in LL, then yy must be all aa’s or all bb’s.

We do not know what mm is, but, whatever mm is, the Pumping Lemma enables us to choose a string w=ambmw = a^{m}b^{m}. Thus yy must consist entirely of aa’s.

Suppose k>0k > 0. We must decompose w=xyzw = xyz as follows for some p+kmp+k \leq m:

x=apx = a^{p}

y=aky = a^{k}

z=ampkbmz =a^{m-p-k} b^{m}

From the Pumping Lemma

w0=amkbmw_{0} = a^{m-k}b^{m}.

Clearly, this is not in LL. But this contradicts the Pumping Lemma.

Hence, the assumption that LL is regular is false. Thus {anbn:n0}\{ a^{n}b^{n}: n \geq 0 \} is not regular.

4.3.5 Using the Pumping Lemma (Viewed as a Game)

The Pumping Lemma guarantees the existence of mm and decomposition xyzxyz for any string in a regular language.

  • But we do not know what mm and xyzxyz are.

  • We do not have contradiction if the Pumping Lemma is violated for some specific mm or xyzxyz.

The Pumping Lemma holds for all wLw \in L and for all i0i \geq 0 (i.e., xyizLxy^{i}z \in L for all ii).

  • We do have a contradiction if the Pumping Lemma is violated for some ww or ii.

We can thus conceptualize a proof as a game against an opponent.

  • Our goal: Establish a contradiction of the Pumping Lemma.

  • Opponent’s goal: Stop us.

  • Moves:

    1. The opponent picks mm.

    2. Given mm, we pick a string ww in LL of length equal or greater than mm. We are free to choose any ww, subject to requirement wLw \in L and |w|m|w| \geq m.

    3. The opponent chooses the decomposition xyzxyz, subject to |xy|m|xy| \leq m and |y|1|y| \geq 1. We have to assume that the opponent makes the choice that will make it hardest for us to win the game.

    4. We try to pick ii in such a way that the pumped string wiw_{i}, as defined in wi=xyizw_{i} = xy^{i}z, is not in LL. If we can do so, we win the game.

Strategy:

  • Choose ww in step 2 carefully. So that, regardless of the xyzxyz choice, contradiction can be established.

4.3.6 Linz Example 4.8

Let Σ={a,b}\Sigma = \{ a, b \}. Show that

L={wwR:wΣ*}L = \{ww^{R}: w \in \Sigma^{*}\}

is not regular.

We use the Pumping Lemma and assume LL is regular.

Whatever mm the opponent picks in step 1 (of the “game”), we can choose a ww as shown below in step 2.

Linz Fig. 4.5

Because of this choice, and the requirement that |xy|m|xy| \leq m, in step 3 the opponent must choose a yy that consists entirely of aa’s. Consider

wi=xyizw_{i} = xy^{i}z

that must hold because of the Pumping Lemma.

In step 4, we use i=0i = 0 in wi=xyizw_{i} = xy^{i}z. This string has fewer aa’s on the left than on the right and so cannot be of the form wwRww^{R}.

Therefore, the Pumping Lemma is violated. LL is not regular.

Warning: Be careful! There are ways we can go wrong in applying the Pumping Lemma.

  • If we choose ww too short in step 2 of this example (i.e., where the first mm symbols include two or more bb’s), then the opponent can choose a yy having an even number of bb’s. In that case, we could not have reached a violation of the pumping lemma on the last stap.

  • If we choose a string ww consisting of all aa’s, say

    w=a2mw = a^{2m}

    which is in LL. To defeat us, the opponent need only pick

    y=aay = aa

    Now wiw_{i} is in LL for all ii, and we lose. ^

  • We must assume the opponent does not make mistakes. If, in the case where we pick w=a2mw = a^{2m}, the opponent picks

    y=ay = a

    then w0w_{0} is a string of odd length and therefore not in LL. But any argument is incorrect if it assumes the opponent fails to make the best possible choice (i.e., y=aay = aa).

4.3.7 Linz Example 4.9

For Σ={a,b}\Sigma = \{ a,b \}, show that the language

L={wΣ*:na(w)<nb(w)}L = \{ w \in \Sigma^{*}: n_{a}(w) < n_{b}(w) \}

is not regular.

We use the Pumping Lemma to show a contradiction. Assume LL is
regular.

Suppose the opponent gives us mm. Because we have complete freedom in choosing wLw \in L, we pick w=ambm+1w = a^{m}b^{m+1}. Now, because |xy||xy| cannot be greater than mm, the opponent cannot do anything but pick a yy with all aa’s, that is,

y=aky = a^{k} for 1km1 \leq k \leq m.

We now pump up, using i=2i = 2. The resulting string

w2=am+kbm+1w_{2} = a^{m+k}b^{m+1}

is not in LL. Therefore, the Pumping Lemma is violated. LL is not regular.

4.3.8 Linz Example 4.10

Show that

L={(ab)nak:n>k,k0}L = \{ (ab)^{n}a^{k}: n > k, k \geq 0 \}

is not regular

We use the Pumping Lemma to show a contradiction. Assume LL is
regular.

Given some mm, we pick as our string

w=(ab)m+1amw = (ab)^{m+1}a^{m}

which is in LL.

The opponent must decompose w=xyzw = xyz so that |xy|m|xy| \leq m and |y|1|y| \geq 1. Thus both xx and yy must be in the part of the string consisting of abab pairs. The choice of xx does not affect the argument, so we can focus on the yy part.

If our opponent picks y=ay = a, we can choose i=0i = 0 and get a string not in L((ab)*a*)L((ab)^{*}a^{*}) and, hence, not in LL. (There is a similar argument for y=by = b.)

If the opponent picks y=aby = ab, we can choose i=0i = 0 again. Now we get the string (ab)mam(ab)^{m}a^{m}, which is not in LL. (There is a similar argument for y=bay = ba.)

In a similar manner, we can counter any possible choice by the opponent. Thus, because of the contradiction, LL is not regular.

4.3.9 Linz Example (Factorial Length Strings)

Note: This example is adapted from an earlier edition of the Linz textbook.

Show that

L={an!:n0}L = \{a^{n!} : n \geq 0\}

is not regular.

We use the Pumping Lemma to show a contradiction. Assume LL is regular.

Given the opponent’s choice for mm, we pick ww to be the string am!a^{m!} (unless the opponent picks m<3m < 3, in which case we can use a3!a^{3!} as ww).

The possible decompositions w=xyzw = xyz (such that |xy|m|xy| \leq m) differ only in the lengths of xx and yy. Suppose the opponent picks yy such that

|y|=km|y| = k \leq m.

According to the Pumping Lemma, xz=am!kLxz = a^{m!-k} \in L. But this string can only be in LL if there exists a jj such that

m!k=j!m! - k = j!.

But this is impossible, because for m3m \geq 3 and kmk \leq m we know (see argument below) that

m!k>(m1)!m! - k > (m -1)!.

Therefore, LL is not regular.

Aside: To see that m!k>(m1)!m! - k > (m - 1)! for m3m \geq 3 and kmk \leq m, note that

m!km! - k \geq m!mm! - m == m(m1)!mm(m - 1)! - m == m((m1)!1)m((m-1)!- 1) >> (m1)!(m-1)!.

4.3.10 Linz Example 4.12

Show that the language

L={anbkcn+k:n0,k0}L = \{a^{n}b^{k}c^{n+k}: n \geq 0, k \geq 0\}

is not regular.

Strategy: Instead of using the Pumping Lemma directly, we show that LL is related to another language we already know is nonregular. This may be an easier argument.

In this example, we use the closure property under homomorphism (Linz Theorem 4.3).

Let hh be defined such that

h(a)=a,h(b)=a,h(c)=ch(a) = a, h(b) = a, h(c) = c.

Then

h(L)h(L) == {an+kcn+k:n+k0}\{ a^{n+k}c^{n+k} : n + k \geq 0 \}
== {aici:i0}\{ a^{i}c^{i} : i \geq 0 \}

But we proved this language was not regular in Linz Example 4.6. Therefore, because of closure under homomorphism, LL cannot be regular either.

Alternative proof by contradiction

Assume LL is regular.

Thus h(L)h(L) is regular by closure under homomorphism (Linz Theorem 4.3).

But we know h(L)h(L) is not regular, so there is a contradiction.

Thus, LL is not regular.

4.3.11 Linz Example 4.13

Show that the language

L={anbl:nl}L = \{ a^{n}b^{l}: n \ne l \}

is not regular.

We use the Pumping Lemma, but this example requires more ingenuity to set up than previous examples.

Assume LL is regular.

Choosing a string wLw \in L with m=n=l+1m = n = l + 1 or m=n=l+2m = n = l + 2 will not lead to a contradiction.

In these cases, the opponent can always choose a decomposition w=xyzw = xyz (with |xy|m|xy| \leq m and |y|1|y| \geq 1) that will make it impossible to pump the string out of the language (that is, pump it so that it has an equal number of aa’s and bb’s). For w=al+1blw = a^{l+1}b^{l}, the opponent can chose yy to be an even number of aa’s. For w=al+2blw = a^{l+2}b^{l}, the opponent can chose yy to be an odd number of aa’s greater than 1.

We must be more creative. Suppose we choose wLw \in L where n=m!n = m! and l=(m+1)!l = (m + 1)!.

If the opponent decomposes w=xyzw = xyz (with |xy|m|xy| \leq m and |y|=k1|y| = k \geq 1), then yy must consist of all aa’s.

If we pump ii times, we generate string xyizxy^{i}z where the number of aa’s is m!+(i1)km! + (i-1)k

We can contradict the Pumping Lemma if we can pick ii such that

m!+(i1)k=(m+1)!m! + (i - 1)k = (m + 1)!.

But we can do this, because it is always possible to choose

i=1+mm!/ki = 1 + mm!/k.

For 1km1 \leq k \leq m, the expression 1+mm!/k1 + mm!/k is an integer.

Thus the generated string has m!+((1+mm!/k)1)km! + ((1 + mm!/k) - 1)k occurrences of aa.

m!+((1+mm!/k)1)km! + ((1 + mm!/k) - 1)k
== m!+mm!m! + mm!
== m!(m+1)m!(m + 1)
== (m+1)!(m+1)!

This introduces a contradiction of the Pumping Lemma. Thus LL is not regular.

Alternative argument (more elegant)

Suppose L={anbl:nl}L = \{ a^{n}b^{l}: n \ne l \} is regular.

Because of complementation closure, L\bar{L} is regular.

Let L1=LL(a*b*)L_{1} = \bar{L} \cap L(a^{*}b^{*}).

But L(a*b*)L(a^{*}b^{*}) is regular and thus, by intersection closure, L1L_{1} is also regular.

But L1={anbn:n0}L_{1} = \{ a^{n}b^{n} : n \geq 0 \}, which we have shown to be nonregular. Thus we have a contradiction, so LL is not regular.

4.3.12 Pitfalls in Using the Pumping Lemma

The Pumping Lemma is difficult to understand and, hence, difficult to apply.

Here are a few suggestions to avoid pitfalls in use of the Pumping Lemma.

  • Do not attempt to use the Pumping Lemma to show a language is regular. Only use it to show a language is not regular.

  • Make sure you start with a string that is in the language.

  • Avoid invalid assumptions about the decomposition of a string ww into xyzxyz. Use only that |xy|m|xy| \leq m and |y|1|y| \geq 1.

Like most interesting “games”, knowledge of the rules for use of the Pumping Lemma is necessary, but it is not sufficient to become a master “player”. To master the use of the Pumping Lemma, one must work problems of various difficulties. Practice, practice, practice.

4.4 References

[1]
Peter Linz. 2011. Formal languages and automata (Fifth ed.). Jones & Bartlett, Burlington, Massachusetts, USA.