CSci 311, Models of Computation Chapter 8 Properties of Context-Free Languages

Introduction
8.1 Two Pumping Lemmas
- 8.1.1 Context-Free Languages
- 8.1.2 Linear Languages
8.2 Closure Properties and Decision Algorithms for Context-Free Languages

Acknowledgements: These lecture notes are for use with Chapter 8 of the textbook: Peter Linz. Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett Learning, 2012. The terminology and notation used in these notes are similar to those used in the Linz textbook. This document uses several figures from the Linz textbook.

Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of December 2015 seems to be a recent version of Firefox from Mozilla.

Introduction

Chapter 4 examines the closure properties of the family of regular languages, algorithms for determining various properties of regular languages, and methods for proving languages are not regular (e.g., the Pumping Lemma).

Chapter 8 examines similar aspects of the family of context-free languages.

8.1 Two Pumping Lemmas

Because of insufficient time and extensive coverage of the Pumping Lemma for regular languages, we will not cover the Pumping Lemmas for Context-Free Languages in this course. See section 8.1 of the Linz textbook if you are interested in this topic.

8.1.1 Context-Free Languages

Linz Section 8.1 includes the following language examples. The
results of these are used in the remainder of this chapter.

Linz Example 8.1 shows $L = \{ a^{n}b^{n}c^{n} : n \geq 0 \}$ is not context free.
Linz Example 8.2 shows $L = \{ ww : w \in \{a,b\}^{*} \}$ is not context free.
Linz Example 8.3 shows $L = \{ a^{n!} : n \geq 0 \}$ is not context free.
Linz Example 8.4 shows $L = \{ a^{n}b^{j} : n = j^{2} \}$ is not context free.

8.1.2 Linear Languages

Linz Section 8.1 includes the following definitions. (The definition of linear grammar is actually from Chapter 3.)

Definition (Linear Grammar): A linear grammar is a grammar in which at most one variable can appear on the right side of any production.

A linear context-free grammar is thus a context-free grammar that is also a linear grammar.

Linz Definition 8.5 (Linear Language): A context-free language $L$ is linear if there exists a linear context-free grammar $G$ such that $L = L(G)$ .

Linz Section 8.1 also includes the following language examples.

Linz Example 8.5 shows $L = \{ a^{n}b^{n} : n \geq 0 \}$ is a linear language.
Linz Example 8.6 shows $L = \{ w : n_{a}(w) = n_{b}(w) \}$ is not linear.

8.2 Closure Properties and Decision Algorithms for Context-Free Languages

In most cases, the proofs and algorithms for the properties of regular languages rely upon manipulation of transition graphs for finite automata. Hence, they are relatively straightforward.

When we consider similar properties for context-free languages, we encounter more difficulties.

Some properties do not hold.
Other properties require more complex arguments.
Some intuitively simple questions cannot be answered.

Let's consider closure under the simple set operations as we did for regular languages in Linz Theorem 4.1.

8.2.1 Closure under Union, Concatenation, and Star-Closure

Linz Theorem 8.3 (Closure under Union, Concatenation, and Star-Closure): The family of context-free languages is closed under (a) union, (b) concatenation, and (c) star-closure.

(8.3a) Proof of Closure under Union:

Let $L_{1}$ and $L_{2}$ be context-free languages with the corresponding context-free grammars $G_{1} = (V_{1}, T_{1}, S_{1}, P_{1})$ and $G_{2} = (V_{2}, T_{2}, S_{2}, P_{2})$ .

Assume $V_{1}$ and $V_{2}$ are disjoint. (If not, we can make them so by renaming.)

Consider $L(G_{3})$ where

$G_{3} = (V_{1} \cup V_{2} \cup \{S_{3}\}, T_{1} \cup T_{2}, S_{3}, P_{3})$

with:

$S_{3} \notin V_{1} \cup V_{2}$ -- i.e, $S_{3}$ is a fresh variable
$P_{3} = P_{1} \cup P_{2} \cup \{\ S_{3} \rightarrow S_{1}\ |\ \ S_{2}\ \}$

Clearly, $G_{3}$ is a context-free grammar. So $L(G_{3})$ is a context-free language.

Now, we need to show that $L(G_{3}) = L_{1} \cup L_{2}$ .

For $w \in L_{1}$ , there is a derivation in $G_{3}$ :

$S_{3} \Rightarrow S_{1} \overset{*}{\Rightarrow} w$

Similarly, for $w \in L_{2}$ , there is a derivation in $G_{3}$ :

$S_{3} \Rightarrow S_{2} \overset{*}{\Rightarrow} w$

Also, for $w \in L(G_{3})$ , the first step of the derivation must be either (1) $S_{3} \Rightarrow S_{1}$ or (2) $S_{3} \Rightarrow S_{2}$ .

For choice 1, the sentential forms derived from $S_{1}$ only have variables from $V_{1}$ . But $V_{1}$ is disjoint from $V_{2}$ . Thus the derivation

$S_{1} \overset{*}{\Rightarrow} w$

can only involve productions from from $P_{1}$ . Hence, for choice 1, $w \in L_{1}$ .

Using a similar argument for choice 2, we conclude $w \in L_{2}$ .

Therefore, $L(G_{3}) = L_{1} \cup L_{2}$ .

QED.

(8.3b) Proof of Closure under Concatenation:

Consider $L(G_{4})$ where

$G_{4} = (V_{1} \cup V_{2} \cup \{ S_{4} \}, T_{1} \cup T_{2}, S_{4}, P_{4})$

with:

$S_{4} \notin V_{1} \cup V_{2}$
$P_{4} = P_{1} \cup P_{2} \cup \{\ S_{4} \rightarrow S_{1} S_{2}\ \}$

Then $L(G_{4}) = L_{1} L_{2}$ follows from a similar argument to the one in part (a).

QED.

(8.3c) Proof of Closure under Star-Closure:

Consider $L(G_{5})$ where

$G_{5} = (V_{1} \cup \{ S_{5} \}, T_{1}, S_{5}, P_{5})$

with:

$S_{5} \notin V_{1}$
$P_{5} = P_{1} \cup \{\ S_{5} \rightarrow S_{1} S_{5}\ |\ \lambda\ \}$

Then $L(G_{5}) = L_{1}^{*}$ follows from a similar argument to the one in part (a).

QED.

8.2.2 Non-Closure under Intersection and Complementation

Linz Theorem 8.4 (Non-closure under Intersection and Complementation): The family of context-free languages is not closed under (a) intersection and (b) complementation.

(8.4b) Proof of Non-closure under Intersection:

Assume the family of context-free languages is closed under intersection. Show that this leads to a contradiction.

It is sufficient to find two context-free languages whose intersection is not context-free.

Consider languages $L_{1}$ and $L_{2}$ defined as follows:

$L_{1} = \{ a^{n}b^{n}c^{m} : n \geq 0, m \geq 0 \}$
$L_{2} = \{ a^{n}b^{m}c^{m} : n \geq 0, m \geq 0 \}$

One way to show that a language is context-free is to find a context-free grammar that generates it. The following context-free grammar generates $L_{1}$ :

$S$ $\rightarrow S_{1}S_{2}$
$S_{1}$ $\rightarrow aS_{1}b\ |\ \lambda$
$S_{2}$ $\rightarrow cS_{2}\ |\ \lambda$

Alternatively, we could observe that $L_{1}$ is the concatenation of two context-free languages and, hence, context-free by Linz Theorem 8.3 above.

Similarly, we can show that $L_{2}$ is context free.

From the assumption, we thus have that $L_{1} \cap L_{2}$ is context free.

But

$L_{1} \cap L_{2} = \{ a^{n}b^{n}c^{n} : n \geq 0 \}$ ,

which is not context free. Linz proves this in Linz Example 8.1 (which is in the part of this chapter we did not cover in this course).

Thus we have a contradiction. Therefore, the family of context-free languages is not closed under intersection.

QED.

(8.4b) Proof of Non-closure under Complementation:

Assume the family of context-free languages is closed under complementation. Show that this leads to a contradiction.

Consider arbitrary context-free languages $L_{1}$ and $L_{2}$ .

From set theory, we know that

$L_{1} \cap L_{2}$ $=$ $\overline{\bar{L_{1}} \cup \bar{L_{2}}}$ .

From Linz Theorem 8.3 and the assumption that context-free languages are closed under complementation, we deduce that the right side ( $\overline{\bar{L_{1}} \cup \bar{L_{2}}}$ ) is a context-free language for all $L_{1}$ and $L_{2}$ .

However, we know from part (a) that the left side ( $L_{1} \cap L_{2}$ ) is not necessarily a context-free language for all $L_{1}$ and $L_{2}$ .

Thus we have a contradiction. Therefore, the family of context-free languages is not closed under complementation.

QED.

8.2.3 Closure under Regular Intersection

Although context-free languages are not, in general, closed under intersection, there is a useful special case that is closed.

Linz Theorem 8.5 (Closure Under Regular Intersection): Let $L_{1}$ be a context-free language and $L_{2}$ be a regular language. Then $L_{1} \cap L_{2}$ is context free.

Proof:

Let $M_{1} = (Q,\Sigma,\Gamma,\delta_{1},q_{0},z,F_{1})$ be an npda that accepts context-free language $L_{1}$ .

Let $M_{2} = (P,\Sigma,\delta_{2},p_{0},F_{2})$ be a dfa that accepts regular language $L_{2}$ .

We construct an npda

$\widehat{M} = (\widehat{Q},\Sigma,\Gamma,\widehat{\delta},\widehat{q_{0}},\widehat{F})$

that simulates $M_{1}$ and $M_{2}$ operating simultaneously (i.e., executes the moves of both machines for each input symbol).

We choose pairs of states from $M_{1}$ and $M_{2}$ to represent the states of $\widehat{M}$ as follows:

$\widehat{Q} = Q \times P$
$\widehat{q_{0}} = (q_{0},p_{0})$
$\widehat{F} = F_{1} \times F_{2}$

We specify $\widehat{\delta}$ such that the moves of $\widehat{M}$ correspond to simultaneous moves of $M_{1}$ and $M_{2}$ . That is,

$((q_{k},p_{l}),x) \in \widehat{\delta}((q_{i},p_{j}),a,b)$

if and only if

$(q_{k},x) \in \delta_{1}(q_{i},a,b)$

and

$\delta_{2}(p_{j},a) = p_{l}$ .

For moves $(q_{i},\lambda,b)$ in $\delta_{1}$ , we extend $\delta_{2}$ so that $\delta_{2}(p_{l},\lambda) = p_{l}$ for all $l$ .

By induction on the length of the derivations, we can prove that

$((q_{0},p_{0}), w, z) \vdash^{*}_{\widehat{M}} ((q_{r},p_{s}), \lambda, x)$ ,

with $q_{r} \in F_{1}$ and $p_{s} \in F_{2}$ if and only if

$(q_{0}, w, z) \vdash^{*}_{M_{1}} (q_{r}, \lambda, x)$

and

$\delta^{*}(p_{0},w) = p_{s}$ .

Therefore, a string is accepted by $\widehat{M}$ if and only if it is accepted by both $M_{1}$ and $M_{2}$ . That is, the string is in $L(M_{1}) \cap L(M_{2}) = L_{1} \cap L_{2}$ .

QED.

8.2.4 Linz Example 8.7

Show that the language

$L = \{ a^{n}b^{n} : n \geq 0, n \neq 100 \}$

is context free.

We can construct an npda or context-free grammar for $L$ , but this is tedious. Instead, we use closure of regular intersection (Linz Theorem 8.5).

Let $L_{1} = \{ a^{100}b^{100} \}$ .

$L_{1}$ is finite, and thus also regular. Hence, $\bar{L_{1}}$ is regular because regular languages are closed under complementation.

From previous results, we know that $L = \{ a^{n}b^{n} : n \geq 0 \}$ is context free.

Clearly, $L = \{ a^{n}b^{n} : n \geq 0 \} \cap \bar{L_{1}}$ .

By the closure of context-free languages under regular intersection, $L$ is a context-free language.

8.2.5 Linz Example 8.8

Show that

$L = \{ w \in \{a,b,c\}^{*} : n_{a}(w) = n_{b}(w) = n_{c}(w) \}$

is not context free.

Although we could use the Pumping Lemma for Context-Free Languages, we again use closure of regular intersection (Linz Theorem 8.5).

Assume that $L$ is context free. Show that this leads to a contradiction.

Thus

$L \cap L( a^{*} b^{*} c^{*} ) = \{ a^{n} b^{n} c^{n} : n \geq 0 \}$

is also context free. But we have previously proved that this language is not context free.

Thus we have a contradiction. Therefore, $L$ is not context free.

8.2.6 Some Decidable Properties of Context Free Languages

There exist algorithms for determine whether a context-free language is empty or nonempty and finite or infinite.

These algorithms process the context-free grammars for the languages. They assume that the grammars are first transformed using various algorithms from Linz Chapter 6 (which we did not cover in this course).

The algorithms from Chapter 6 include the removal of

useless symbols and productions (i.e., variables and productions that can never generate a sentence)
$\lambda$ -productions (i.e., productions with $\lambda$ on the right side)
unit productions (i.e., productions of the form $A \rightarrow B$ )

Linz Theorem 8.6 (Determining Empty Context-Free Languages): Given a context-free grammar $G = (V,T,S,P)$ , then there exists an algorithm for determining whether or not $L(G)$ is empty.

Basic idea of algorithm: Assuming $\lambda \notin L$ , remove the useless productions. If the start symbol is useless, then $L$ is empty. Otherwise, $L$ is nonempty.

Linz Theorem 8.7 (Determining Infinite Context-Free Languages): Given a context-free grammar $G = (V,T,S,P)$ , then there exists an algorithm for determining whether or not $L(G)$ is infinite.

Basic idea of algorithm: Remove useless symbols, $\lambda$ -productions, and unit productions. If there are variables $A$ that repeat as in

$A \overset{*}{\Rightarrow} xAy$

then the language is infinite. Otherwise, the language is finite. To determine repeated variables, we can build a graph of the dependencies of the variables on each other. If this graph has a cycle, then the variable at the base of the cycle is repeated.

Unfortunately, other simple properties are not as easy as the above.

For example, there is no algorithm to determine whether two context-free grammars generate the same language.

CSci 311, Models of Computation
Chapter 8
Properties of Context-Free Languages

H. Conrad Cunningham

29 December 2015

Introduction

8.1 Two Pumping Lemmas

8.1.1 Context-Free Languages

8.1.2 Linear Languages

8.2 Closure Properties and Decision Algorithms for Context-Free Languages

8.2.1 Closure under Union, Concatenation, and Star-Closure

8.2.2 Non-Closure under Intersection and Complementation

8.2.3 Closure under Regular Intersection

8.2.4 Linz Example 8.7

8.2.5 Linz Example 8.8

8.2.6 Some Decidable Properties of Context Free Languages