Copyright (C) 2015, H. Conrad Cunningham
Acknowledgements: MS student Eli Allen assisted in preparation of these notes. These lecture notes are for use with Chapter 4 of the textbook: Peter Linz. Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett Learning, 2012. The terminology and notation used in these notes are similar to those used in the Linz textbook. This document uses several figures from the Linz textbook.
Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of December 2015 seems to be a recent version of Firefox from Mozilla.
The questions answered in this chapter include:
The concepts introduced in this chapter are:
Definition (Operation): An operation is a function where for some sets with . is the number of operands (or arguments) of the operation.
We often use special notation and conventions for unary and binary operations. For example:
a binary operation may be written in an infix style as in and
a unary operation may be written in a prefix style as in , suffix style such as , or special style such as or
a binary operation may be implied by the juxtaposition such as for multiplication or (in a different context) for string concatenation or implied by superscripting such as for exponentiation
Often we consider an operations on a set, where all the operands and the result are drawn from the same set.
Definition (Closure): A set is closed under a unary operation if, for all , . Similarly, a set is closed under a binary operation if, for all and , .
Examples arithmetic on the set of natural numbers ()
Binary operations addition () and multiplication ( in programming languages) are closed on
Binary operations subtraction () and division () are not closed on
For example, is not a natural number.
For example, is not a natural number.
Unary operation negation (operator written in prefix form) is not closed on .
However, the set of integers is closed under subtraction and negation. But it is not closed under division or square root (as we normally define the operations).
Now, let's consider closure of the set of regular languages with respect to the simple set operations.
Linz Theorem 4.1 (Closure under Simple Set Operations): If and are regular languages, then so are , , , , and .
That is, we say that the family of regular languages is closed under union, intersection, concatenation, complementation, and star-closure.
Proof of
Let and be regular languages.
{ Th. 3.2: there exist regular expressions , } | |
{ Def. 3.2, rule 4 } | |
Thus, by Theorem 3.1 (regular expressions describe regular languages), the union is a regular language.
Thus is a regular language. QED.
Proofs of and
Similar to the proof of .
Proof of
Strategy: Given a dfa for the regular language, construct a new dfa that accepts everything rejected and rejects everything accepted by the given dfa.
is a regular language on . | |
{ Def. 2.3 } | |
dfa such that . |
Thus
{ by the properties of dfas and sets } | |
Either or | |
{ Def. 2.2: language accepted by dfa } | |
Either or for some dfa |
Let's construct dfa .
Clearly, . Thus is a regular language. QED.
Proof of
Strategy: Given two dfas for the two regular languages, construct a new dfa that accepts a string if and only if both original dfas accept the string.
Let and for dfas:
Construct , where
- when
Clearly, if and only if accepted by .
Thus, is regular. QED.
The previous proof is constructive.
It establishes desired result.
It provides an algorithm for building an item of interest (e.g., dfa to accept ).
Sometimes nonconstructive proofs are shorter and easier to understand. But they provide no algorithm.
Alternate (nonconstructive) proof for
and are regular. | |
{ previously proved part of Theorem 4.1 } | |
and are regular. | |
{ previously proved part of Theorem 4.1 } | |
is regular | |
{ previously proved part of Theorem 4.1 } | |
is regular | |
{ deMorgan's Law for sets } | |
is regular |
QED.
Consider the difference between two regular languages and , written .
But this is just set difference, which is defined .
From Theorem 4.1 above, we know that regular languages are closed under both complementation and intersection. Thus, regular languages are closed under difference as well.
Linz Theorem 4.2 (Closure under Reversal): The family of regular languages is closed under reversal.
Proof (constructive)
Strategy: Construct an nfa for the regular language and then reverse all the edges and exchange roles of the initial and final states.
Let be a regular language. Construct an nfa such that and has a single final state. (We can add transitions from the previous final states to create a single new final state.)
Now construct a new nfa as follows.
Thus nfa accepts if and only if the original nfa accepts . QED.
In mathematics, a homomorphism is a mapping between two mathematical structures that preserves the essential structure.
Linz Definition 4.1 (Homomorphism): Suppose and are alphabets. A function
is called a homomorphism.
In words, a homomorphism is a substitution in which a single letter is replaced with a string.
We can extend the domain of a function to strings in an obvious fashion. If
for
then
.
If is a language on , then we define its homomorphic image as
.
Note: The homomorphism function preserves the essential structure of the language. In particular, it preserves operation concatenation on strings, i.e., and .
Let and .
Define as follows:
,
Then .
The homomorphic image of is the language .
If we have a regular expression for a language , then a regular expression for can be obtained by simply applying the homomorphism to each symbol of . We show this in the next example.
For and , define :
If is a regular language denoted by the regular expression
then
denotes the regular language .
The general result on the closure of regular languages under any homomorphism follows from this example in an obvious manner.
Linz Theorem 4.3 (Closure under Homomorphism): Let be a homomorphism. If is a regular language, then its homomorphic image is also regular.
Proof: Similar to the argument in Example 4.3. See Linz textbook for full proof.
The family of regular languages is therefore closed under arbitrary homomorphisms.
Linz Definition 4.2 (Right Quotient): Let and be languages on the same alphabet. Then the right quotient of with is defined as
for some
Given languages and such that
Then
.
The strings in consist of one or more 's. Therefore, we arrive at the answer by removing one or more 's from those strings in that terminate with at least one as a suffix.
Note that in this example , , and are regular.
Can we construct a dfa for from dfas for and ?
Linz Figure 4.1 shows a dfa that accepts .
An automaton for must accept any such that and .
For all states , if there exists a walk labeled from to a final state such that , then make a final state of the automaton for .
In this example, we check states to see if there is walk to any of the final states , , or .
and have such walks.
, , and do not.
The resulting automaton is shown in Linz Figure 4.2.
The next theorem generalizes this construction.
Linz Theorem 4.4 (Closure under Right Quotient): If and are regular languages, then is also regular. We say that the family of regular languages is closed under right quotient with a regular language.
Proof
Let dfa such that .
Construct dfa for as follows.
For all , let dfa . That is, dfa is the same as except that it starts at .
From Theorem 4.1, we know is regular. Thus we can construct the intersection machine as show in the proof of Theorem 4.1.
If there is any path in the intersection machine from its initial state to a final state, then . Thus in machine .
Does ?
First, let .
By definition, there must be such that .
Thus .
There must be some such that and .
Thus, by construction, . Hence, accepts .
Now, let be accepted by .
.
Thus, by construction, we know there is a such that .
Thus , which means is regular.
Find for
We apply the construction (algorithm) used in the proof of Theorem 4.4.
Linz Figure 4.3 shows a dfa for .
Let .
Thus if we construct the sequence of machines
then the resulting dfa for is shown in Linz Figure 4.4.
The automaton shown in Figure 4.4 accepts the language denoted by the regular expression
which can be simplified to
Fundamental question: Is ?
It is difficult to find a membership algorithm for languages in general. But it is relatively easy to do for regular languages.
A regular language is given in a standard representation if and only if described with one of:
Linz Theorem 4.5 (Membership): Given a standard representation of any regular language on and any there exists an algorithm for determining whether or not is in .
Proof
We represent the language by some dfa, then test to see if it is accepted by this automaton. QED.
Linz Theorem 4.6 (Finiteness): There exists an algorithm for determining whether a regular language, given in standard representation, is empty, finite, or infinite.
Proof
Represent as a transition graph of a dfa.
If simple path exists from the initial state to any final state, then it is not empty. Otherwise, it is empty.
If any vertex on a cycle is in a path from the initial state to any final state, then the language is infinite. Otherwise, it is finite.
QED.
Consider the question ?
This is practically important. But it is a difficult issue because there are many ways to represent and .
Linz Theorem 4.7 (Equality): Given a standard representation of two regular languages and , there exists an algorithm to determine whether or whether not .
Proof
Let .
By closure, is regular. Hence, there is a dfa that accepts .
Because of Theorem 4.6, we can determine whether is empty or not.
But from Excerise 8, Section 1.1, we see that if and only if . QED.
A regular languages may be infinite
In processing a string, the amount of information that the automaton must "remember" is strictly limited (finite and bounded).
In mathematics, the pigeonhole principle refers to the following simple observation:
If we put objects into boxes (pigeonholes), and, if , at least one box must hold more than one item.
This is obvious, but it has deep implications.
Is the language regular?
The answer is no, as we show below.
Proof that is not regular
Strategy: Use proof by contradiction. Assume that what we want to prove is false. Show that this introduces a contradiction. Hence, the original assumption must be true.
Assume is regular.
Thus there exists a dfa such that .
Machine has a specific number of states. However, the number of 's in a string in is finite but unbounded (i.e., no maximum value for the length). If is larger than the number of states in , then, according to the pigeonhole principle, there must be some state such that
and
with . But, because accepts ,
for some .
From this we reason as follows:
But this contradicts the assumption that accepts only if . Therefore, cannot be regular. QED
We can use the pigeonhole principle to make "finite memory" precise.
Linz Theorem 4.8 (Pumping Lemma for Regular Languages): Let be an infinite regular language. There exists some such that any with can be decomposed as
with
and
such that
is also in for all .
That is, we can break every sufficiently long string from into three parts in such a way that an arbitrary number of repetitions of the middle part yields another string in .
We can "pump" the middle string, which gives us the name pumping lemma for this theorem.
Proof
Let be an infinite regular language. Thus there exists a dfa that accepts . Let have states .
Consider a string such that . Such a string exists because is infinite.
Consider the set of states that traverses as it processes .
The size of this set is exactly . Thus, according to the pigeonhole principle, at least one state must be repeated, and such a repetition must start no later than the th move.
Thus the sequence is of the form
.
Then there are substrings , , and of such that
with and . Thus, for any ,
QED.
We can use the pumping lemma to show that languages are not regular. Each of these is a proof by contradiction.
Show that is not regular.
Assume that is regular, so that the Pumping Lemma must hold.
If, for some and , and are both in , then must be all 's or all 's.
We do not know what is, but, whatever is, the Pumping Lemma enables us to choose a string . Thus must consist entirely of 's.
Suppose . We must decompose as follows for some :
From the Pumping Lemma
.
Clearly, this is not in . But this contradicts the Pumping Lemma.
Hence, the assumption that is regular is false. Thus is not regular.
The Pumping Lemma guarantees the existence of and decomposition for any string in a regular language.
But we do not know what and are.
We do not have contradiction if the Pumping Lemma is violated for some specific or .
The Pumping Lemma holds for all and for all (i.e., for all ).
We can thus conceptualize a proof as a game against an opponent.
Our goal: Establish a contradiction of the Pumping Lemma.
Opponent's goal: Stop us.
Moves:
The opponent picks .
Given , we pick a string in of length equal or greater than . We are free to choose any , subject to requirement and .
The opponent chooses the decomposition , subject to and . We have to assume that the opponent makes the choice that will make it hardest for us to win the game.
We try to pick in such a way that the pumped string , as defined in , is not in . If we can do so, we win the game.
Strategy:
Let . Show that
is not regular.
We use the Pumping Lemma and assume is regular.
Whatever the opponent picks in step 1 (of the "game"), we can choose a as shown below in step 2.
Because of this choice, and the requirement that , in step 3 the opponent must choose a that consists entirely of 's. Consider
that must hold because of the Pumping Lemma.
In step 4, we use in . This string has fewer 's on the left than on the right and so cannot be of the form .
Therefore, the Pumping Lemma is violated. is not regular.
Warning: Be careful! There are ways we can go wrong in applying the Pumping Lemma.
If we choose too short in step 2 of this example (i.e., where the first symbols include two or more 's), then the opponent can choose a having an even number of 's. In that case, we could not have reached a violation of the pumping lemma on the last stap.
If we choose a string consisting of all 's, say
which is in . To defeat us, the opponent need only pick
Now is in for all , and we lose. ^
We must assume the opponent does not make mistakes. If, in the case where we pick , the opponent picks
then is a string of odd length and therefore not in . But any argument is incorrect if it assumes the opponent fails to make the best possible choice (i.e., ).
For , show that the language
is not regular.
We use the Pumping Lemma to show a contradiction. Assume is
regular.
Suppose the opponent gives us . Because we have complete freedom in choosing , we pick . Now, because cannot be greater than , the opponent cannot do anything but pick a with all 's, that is,
for .
We now pump up, using . The resulting string
is not in . Therefore, the Pumping Lemma is violated. is not regular.
Show that
is not regular
We use the Pumping Lemma to show a contradiction. Assume is
regular.
Given some , we pick as our string
which is in .
The opponent must decompose so that and . Thus both and must be in the part of the string consisting of pairs. The choice of does not affect the argument, so we can focus on the part.
If our opponent picks , we can choose and get a string not in and, hence, not in . (There is a similar argument for .)
If the opponent picks , we can choose again. Now we get the string , which is not in . (There is a similar argument for .)
In a similar manner, we can counter any possible choice by the opponent. Thus, because of the contradiction, is not regular.
Note: This example is adapted from an earlier edition of the Linz textbook.
Show that
is not regular.
We use the Pumping Lemma to show a contradiction. Assume is regular.
Given the opponent's choice for , we pick to be the string (unless the opponent picks , in which case we can use as ).
The possible decompositions (such that ) differ only in the lengths of and . Suppose the opponent picks such that
.
According to the Pumping Lemma, . But this string can only be in if there exists a such that
.
But this is impossible, because for and we know (see argument below) that
.
Therefore, is not regular.
Aside: To see that for and , note that
.
Show that the language
is not regular.
Strategy: Instead of using the Pumping Lemma directly, we show that is related to another language we already know is nonregular. This may be an easier argument.
In this example, we use the closure property under homomorphism (Linz Theorem 4.3).
Let be defined such that
.
Then
But we proved this language was not regular in Linz Example 4.6. Therefore, because of closure under homomorphism, cannot be regular either.
Alternative proof by contradiction
Assume is regular.
Thus is regular by closure under homomorphism (Linz Theorem 4.3).
But we know is not regular, so there is a contradiction.
Thus, is not regular.
Show that the language
is not regular.
We use the Pumping Lemma, but this example requires more ingenuity to set up than previous examples.
Assume is regular.
Choosing a string with or will not lead to a contradiction.
In these cases, the opponent can always choose a decomposition (with and ) that will make it impossible to pump the string out of the language (that is, pump it so that it has an equal number of 's and 's). For , the opponent can chose to be an even number of 's. For , the opponent can chose to be an odd number of 's greater than 1.
We must be more creative. Suppose we choose where and .
If the opponent decomposes (with and ), then must consist of all 's.
If we pump times, we generate string where the number of 's is
We can contradict the Pumping Lemma if we can pick such that
.
But we can do this, because it is always possible to choose
.
For , the expression is an integer.
Thus the generated string has occurrences of .
This introduces a contradiction of the Pumping Lemma. Thus is not regular.
Alternative argument (more elegant)
Suppose is regular.
Because of complementation closure, is regular.
Let .
But is regular and thus, by intersection closure, is also regular.
But , which we have shown to be nonregular. Thus we have a contradiction, so is not regular.
The Pumping Lemma is difficult to understand and, hence, difficult to apply.
Here are a few suggestions to avoid pitfalls in use of the Pumping Lemma.
Do not attempt to use the Pumping Lemma to show a language is regular. Only use it to show a language is not regular.
Make sure you start with a string that is in the language.
Avoid invalid assumptions about the decomposition of a string into . Use only that and .
Like most interesting "games", knowledge of the rules for use of the Pumping Lemma is necessary, but it is not sufficient to become a master "player". To master the use of the Pumping Lemma, one must work problems of various difficulties. Practice, practice, practice.