Copyright (C) 2015, H. Conrad Cunningham
Acknowledgements: These lecture notes are for use with Chapter 8 of the textbook: Peter Linz. Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett Learning, 2012. The terminology and notation used in these notes are similar to those used in the Linz textbook. This document uses several figures from the Linz textbook.
Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of December 2015 seems to be a recent version of Firefox from Mozilla.
Chapter 4 examines the closure properties of the family of regular languages, algorithms for determining various properties of regular languages, and methods for proving languages are not regular (e.g., the Pumping Lemma).
Chapter 8 examines similar aspects of the family of context-free languages.
Because of insufficient time and extensive coverage of the Pumping Lemma for regular languages, we will not cover the Pumping Lemmas for Context-Free Languages in this course. See section 8.1 of the Linz textbook if you are interested in this topic.
Linz Section 8.1 includes the following language examples. The
results of these are used in the remainder of this chapter.
Linz Example 8.1 shows is not context free.
Linz Example 8.2 shows is not context free.
Linz Example 8.3 shows is not context free.
Linz Example 8.4 shows is not context free.
Linz Section 8.1 includes the following definitions. (The definition of linear grammar is actually from Chapter 3.)
Definition (Linear Grammar): A linear grammar is a grammar in which at most one variable can appear on the right side of any production.
A linear context-free grammar is thus a context-free grammar that is also a linear grammar.
Linz Definition 8.5 (Linear Language): A context-free language is linear if there exists a linear context-free grammar such that .
Linz Section 8.1 also includes the following language examples.
Linz Example 8.5 shows is a linear language.
Linz Example 8.6 shows is not linear.
In most cases, the proofs and algorithms for the properties of regular languages rely upon manipulation of transition graphs for finite automata. Hence, they are relatively straightforward.
When we consider similar properties for context-free languages, we encounter more difficulties.
Let's consider closure under the simple set operations as we did for regular languages in Linz Theorem 4.1.
Linz Theorem 8.3 (Closure under Union, Concatenation, and Star-Closure): The family of context-free languages is closed under (a) union, (b) concatenation, and (c) star-closure.
(8.3a) Proof of Closure under Union:
Let and be context-free languages with the corresponding context-free grammars and .
Assume and are disjoint. (If not, we can make them so by renaming.)
Consider where
with:
-- i.e, is a fresh variable
Clearly, is a context-free grammar. So is a context-free language.
Now, we need to show that .
For , there is a derivation in :
Similarly, for , there is a derivation in :
Also, for , the first step of the derivation must be either (1) or (2) .
For choice 1, the sentential forms derived from only have variables from . But is disjoint from . Thus the derivation
can only involve productions from from . Hence, for choice 1, .
Using a similar argument for choice 2, we conclude .
Therefore, .
QED.
(8.3b) Proof of Closure under Concatenation:
Consider where
with:
Then follows from a similar argument to the one in part (a).
QED.
(8.3c) Proof of Closure under Star-Closure:
Consider where
with:
Then follows from a similar argument to the one in part (a).
QED.
Linz Theorem 8.4 (Non-closure under Intersection and Complementation): The family of context-free languages is not closed under (a) intersection and (b) complementation.
(8.4b) Proof of Non-closure under Intersection:
Assume the family of context-free languages is closed under intersection. Show that this leads to a contradiction.
It is sufficient to find two context-free languages whose intersection is not context-free.
Consider languages and defined as follows:
One way to show that a language is context-free is to find a context-free grammar that generates it. The following context-free grammar generates :
Alternatively, we could observe that is the concatenation of two context-free languages and, hence, context-free by Linz Theorem 8.3 above.
Similarly, we can show that is context free.
From the assumption, we thus have that is context free.
But
,
which is not context free. Linz proves this in Linz Example 8.1 (which is in the part of this chapter we did not cover in this course).
Thus we have a contradiction. Therefore, the family of context-free languages is not closed under intersection.
QED.
(8.4b) Proof of Non-closure under Complementation:
Assume the family of context-free languages is closed under complementation. Show that this leads to a contradiction.
Consider arbitrary context-free languages and .
From set theory, we know that
.
From Linz Theorem 8.3 and the assumption that context-free languages are closed under complementation, we deduce that the right side () is a context-free language for all and .
However, we know from part (a) that the left side () is not necessarily a context-free language for all and .
Thus we have a contradiction. Therefore, the family of context-free languages is not closed under complementation.
QED.
Although context-free languages are not, in general, closed under intersection, there is a useful special case that is closed.
Linz Theorem 8.5 (Closure Under Regular Intersection): Let be a context-free language and be a regular language. Then is context free.
Proof:
Let be an npda that accepts context-free language .
Let be a dfa that accepts regular language .
We construct an npda
that simulates and operating simultaneously (i.e., executes the moves of both machines for each input symbol).
We choose pairs of states from and to represent the states of as follows:
We specify such that the moves of correspond to simultaneous moves of and . That is,
if and only if
and
.
For moves in , we extend so that for all .
By induction on the length of the derivations, we can prove that
,
with and if and only if
and
.
Therefore, a string is accepted by if and only if it is accepted by both and . That is, the string is in .
QED.
Show that the language
is context free.
We can construct an npda or context-free grammar for , but this is tedious. Instead, we use closure of regular intersection (Linz Theorem 8.5).
Let .
is finite, and thus also regular. Hence, is regular because regular languages are closed under complementation.
From previous results, we know that is context free.
Clearly, .
By the closure of context-free languages under regular intersection, is a context-free language.
Show that
is not context free.
Although we could use the Pumping Lemma for Context-Free Languages, we again use closure of regular intersection (Linz Theorem 8.5).
Assume that is context free. Show that this leads to a contradiction.
Thus
is also context free. But we have previously proved that this language is not context free.
Thus we have a contradiction. Therefore, is not context free.
There exist algorithms for determine whether a context-free language is empty or nonempty and finite or infinite.
These algorithms process the context-free grammars for the languages. They assume that the grammars are first transformed using various algorithms from Linz Chapter 6 (which we did not cover in this course).
The algorithms from Chapter 6 include the removal of
useless symbols and productions (i.e., variables and productions that can never generate a sentence)
-productions (i.e., productions with on the right side)
unit productions (i.e., productions of the form )
Linz Theorem 8.6 (Determining Empty Context-Free Languages): Given a context-free grammar , then there exists an algorithm for determining whether or not is empty.
Basic idea of algorithm: Assuming , remove the useless productions. If the start symbol is useless, then is empty. Otherwise, is nonempty.
Linz Theorem 8.7 (Determining Infinite Context-Free Languages): Given a context-free grammar , then there exists an algorithm for determining whether or not is infinite.
Basic idea of algorithm: Remove useless symbols, -productions, and unit productions. If there are variables that repeat as in
then the language is infinite. Otherwise, the language is finite. To determine repeated variables, we can build a graph of the dependencies of the variables on each other. If this graph has a cycle, then the variable at the base of the cycle is repeated.
Unfortunately, other simple properties are not as easy as the above.
For example, there is no algorithm to determine whether two context-free grammars generate the same language.