Copyright (C) 2015, H. Conrad Cunningham

Acknowledgements: These lecture notes are for use with Chapter 8 of the textbook: Peter Linz. Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett Learning, 2012. The terminology and notation used in these notes are similar to those used in the Linz textbook. This document uses several figures from the Linz textbook.

Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of December 2015 seems to be a recent version of Firefox from Mozilla.

Introduction

Chapter 4 examines the closure properties of the family of regular languages, algorithms for determining various properties of regular languages, and methods for proving languages are not regular (e.g., the Pumping Lemma).

Chapter 8 examines similar aspects of the family of context-free languages.

8.1 Two Pumping Lemmas

Because of insufficient time and extensive coverage of the Pumping Lemma for regular languages, we will not cover the Pumping Lemmas for Context-Free Languages in this course. See section 8.1 of the Linz textbook if you are interested in this topic.

8.1.1 Context-Free Languages

Linz Section 8.1 includes the following language examples. The
results of these are used in the remainder of this chapter.

  1. Linz Example 8.1 shows L={anbncn:n0}L = \{ a^{n}b^{n}c^{n} : n \geq 0 \} is not context free.

  2. Linz Example 8.2 shows L={ww:w{a,b}*}L = \{ ww : w \in \{a,b\}^{*} \} is not context free.

  3. Linz Example 8.3 shows L={an!:n0}L = \{ a^{n!} : n \geq 0 \} is not context free.

  4. Linz Example 8.4 shows L={anbj:n=j2}L = \{ a^{n}b^{j} : n = j^{2} \} is not context free.

8.1.2 Linear Languages

Linz Section 8.1 includes the following definitions. (The definition of linear grammar is actually from Chapter 3.)

Definition (Linear Grammar): A linear grammar is a grammar in which at most one variable can appear on the right side of any production.

A linear context-free grammar is thus a context-free grammar that is also a linear grammar.

Linz Definition 8.5 (Linear Language): A context-free language LL is linear if there exists a linear context-free grammar GG such that L=L(G)L = L(G).

Linz Section 8.1 also includes the following language examples.

  1. Linz Example 8.5 shows L={anbn:n0}L = \{ a^{n}b^{n} : n \geq 0 \} is a linear language.

  2. Linz Example 8.6 shows L={w:na(w)=nb(w)}L = \{ w : n_{a}(w) = n_{b}(w) \} is not linear.

8.2 Closure Properties and Decision Algorithms for Context-Free Languages

In most cases, the proofs and algorithms for the properties of regular languages rely upon manipulation of transition graphs for finite automata. Hence, they are relatively straightforward.

When we consider similar properties for context-free languages, we encounter more difficulties.

Let's consider closure under the simple set operations as we did for regular languages in Linz Theorem 4.1.

8.2.1 Closure under Union, Concatenation, and Star-Closure

Linz Theorem 8.3 (Closure under Union, Concatenation, and Star-Closure): The family of context-free languages is closed under (a) union, (b) concatenation, and (c) star-closure.

(8.3a) Proof of Closure under Union:

Let L1L_{1} and L2L_{2} be context-free languages with the corresponding context-free grammars G1=(V1,T1,S1,P1)G_{1} = (V_{1}, T_{1}, S_{1}, P_{1}) and G2=(V2,T2,S2,P2)G_{2} = (V_{2}, T_{2}, S_{2}, P_{2}).

Assume V1V_{1} and V2V_{2} are disjoint. (If not, we can make them so by renaming.)

Consider L(G3)L(G_{3}) where

G3=(V1V2{S3},T1T2,S3,P3)G_{3} = (V_{1} \cup V_{2} \cup \{S_{3}\}, T_{1} \cup T_{2}, S_{3}, P_{3})

with:

S3V1V2S_{3} \notin V_{1} \cup V_{2} -- i.e, S3S_{3} is a fresh variable
P3=P1P2{S3S1|S2}P_{3} = P_{1} \cup P_{2} \cup \{\ S_{3} \rightarrow S_{1}\ |\ \ S_{2}\ \}

Clearly, G3G_{3} is a context-free grammar. So L(G3)L(G_{3}) is a context-free language.

Now, we need to show that L(G3)=L1L2L(G_{3}) = L_{1} \cup L_{2}.

For wL1w \in L_{1}, there is a derivation in G3G_{3}:

  1. S3S1*wS_{3} \Rightarrow S_{1} \overset{*}{\Rightarrow} w

Similarly, for wL2w \in L_{2}, there is a derivation in G3G_{3}:

  1. S3S2*wS_{3} \Rightarrow S_{2} \overset{*}{\Rightarrow} w

Also, for wL(G3)w \in L(G_{3}), the first step of the derivation must be either (1) S3S1S_{3} \Rightarrow S_{1} or (2) S3S2S_{3} \Rightarrow S_{2}.

For choice 1, the sentential forms derived from S1S_{1} only have variables from V1V_{1}. But V1V_{1} is disjoint from V2V_{2}. Thus the derivation

S1*wS_{1} \overset{*}{\Rightarrow} w

can only involve productions from from P1P_{1}. Hence, for choice 1, wL1w \in L_{1}.

Using a similar argument for choice 2, we conclude wL2w \in L_{2}.

Therefore, L(G3)=L1L2L(G_{3}) = L_{1} \cup L_{2}.

QED.

(8.3b) Proof of Closure under Concatenation:

Consider L(G4)L(G_{4}) where

G4=(V1V2{S4},T1T2,S4,P4)G_{4} = (V_{1} \cup V_{2} \cup \{ S_{4} \}, T_{1} \cup T_{2}, S_{4}, P_{4})

with:

S4V1V2S_{4} \notin V_{1} \cup V_{2}
P4=P1P2{S4S1S2}P_{4} = P_{1} \cup P_{2} \cup \{\ S_{4} \rightarrow S_{1} S_{2}\ \}

Then L(G4)=L1L2L(G_{4}) = L_{1} L_{2} follows from a similar argument to the one in part (a).

QED.

(8.3c) Proof of Closure under Star-Closure:

Consider L(G5)L(G_{5}) where

G5=(V1{S5},T1,S5,P5)G_{5} = (V_{1} \cup \{ S_{5} \}, T_{1}, S_{5}, P_{5})

with:

S5V1S_{5} \notin V_{1}
P5=P1{S5S1S5|λ}P_{5} = P_{1} \cup \{\ S_{5} \rightarrow S_{1} S_{5}\ |\ \lambda\ \}

Then L(G5)=L1*L(G_{5}) = L_{1}^{*} follows from a similar argument to the one in part (a).

QED.

8.2.2 Non-Closure under Intersection and Complementation

Linz Theorem 8.4 (Non-closure under Intersection and Complementation): The family of context-free languages is not closed under (a) intersection and (b) complementation.

(8.4b) Proof of Non-closure under Intersection:

Assume the family of context-free languages is closed under intersection. Show that this leads to a contradiction.

It is sufficient to find two context-free languages whose intersection is not context-free.

Consider languages L1L_{1} and L2L_{2} defined as follows:

L1={anbncm:n0,m0}L_{1} = \{ a^{n}b^{n}c^{m} : n \geq 0, m \geq 0 \}
L2={anbmcm:n0,m0}L_{2} = \{ a^{n}b^{m}c^{m} : n \geq 0, m \geq 0 \}

One way to show that a language is context-free is to find a context-free grammar that generates it. The following context-free grammar generates L1L_{1}:

SS S1S2\rightarrow S_{1}S_{2}
S1S_{1} aS1b|λ\rightarrow aS_{1}b\ |\ \lambda
S2S_{2} cS2|λ\rightarrow cS_{2}\ |\ \lambda

Alternatively, we could observe that L1L_{1} is the concatenation of two context-free languages and, hence, context-free by Linz Theorem 8.3 above.

Similarly, we can show that L2L_{2} is context free.

From the assumption, we thus have that L1L2L_{1} \cap L_{2} is context free.

But

L1L2={anbncn:n0}L_{1} \cap L_{2} = \{ a^{n}b^{n}c^{n} : n \geq 0 \},

which is not context free. Linz proves this in Linz Example 8.1 (which is in the part of this chapter we did not cover in this course).

Thus we have a contradiction. Therefore, the family of context-free languages is not closed under intersection.

QED.

(8.4b) Proof of Non-closure under Complementation:

Assume the family of context-free languages is closed under complementation. Show that this leads to a contradiction.

Consider arbitrary context-free languages L1L_{1} and L2L_{2}.

From set theory, we know that

L1L2L_{1} \cap L_{2} == L1L2¯\overline{\bar{L_{1}} \cup \bar{L_{2}}}.

From Linz Theorem 8.3 and the assumption that context-free languages are closed under complementation, we deduce that the right side (L1L2¯\overline{\bar{L_{1}} \cup \bar{L_{2}}}) is a context-free language for all L1L_{1} and L2L_{2}.

However, we know from part (a) that the left side (L1L2L_{1} \cap L_{2}) is not necessarily a context-free language for all L1L_{1} and L2L_{2}.

Thus we have a contradiction. Therefore, the family of context-free languages is not closed under complementation.

QED.

8.2.3 Closure under Regular Intersection

Although context-free languages are not, in general, closed under intersection, there is a useful special case that is closed.

Linz Theorem 8.5 (Closure Under Regular Intersection): Let L1L_{1} be a context-free language and L2L_{2} be a regular language. Then L1L2L_{1} \cap L_{2} is context free.

Proof:

Let M1=(Q,Σ,Γ,δ1,q0,z,F1)M_{1} = (Q,\Sigma,\Gamma,\delta_{1},q_{0},z,F_{1}) be an npda that accepts context-free language L1L_{1}.

Let M2=(P,Σ,δ2,p0,F2)M_{2} = (P,\Sigma,\delta_{2},p_{0},F_{2}) be a dfa that accepts regular language L2L_{2}.

We construct an npda

Mˆ=(Qˆ,Σ,Γ,δˆ,q0ˆ,Fˆ)\widehat{M} = (\widehat{Q},\Sigma,\Gamma,\widehat{\delta},\widehat{q_{0}},\widehat{F})

that simulates M1M_{1} and M2M_{2} operating simultaneously (i.e., executes the moves of both machines for each input symbol).

We choose pairs of states from M1M_{1} and M2M_{2} to represent the states of Mˆ\widehat{M} as follows:

Qˆ=Q×P\widehat{Q} = Q \times P
q0ˆ=(q0,p0)\widehat{q_{0}} = (q_{0},p_{0})
Fˆ=F1×F2\widehat{F} = F_{1} \times F_{2}

We specify δˆ\widehat{\delta} such that the moves of Mˆ\widehat{M} correspond to simultaneous moves of M1M_{1} and M2M_{2}. That is,

((qk,pl),x)δˆ((qi,pj),a,b)((q_{k},p_{l}),x) \in \widehat{\delta}((q_{i},p_{j}),a,b)

if and only if

(qk,x)δ1(qi,a,b)(q_{k},x) \in \delta_{1}(q_{i},a,b)

and

δ2(pj,a)=pl\delta_{2}(p_{j},a) = p_{l}.

For moves (qi,λ,b)(q_{i},\lambda,b) in δ1\delta_{1}, we extend δ2\delta_{2} so that δ2(pl,λ)=pl\delta_{2}(p_{l},\lambda) = p_{l} for all ll.

By induction on the length of the derivations, we can prove that

((q0,p0),w,z)Mˆ*((qr,ps),λ,x)((q_{0},p_{0}), w, z) \vdash^{*}_{\widehat{M}} ((q_{r},p_{s}), \lambda, x),

with qrF1q_{r} \in F_{1} and psF2p_{s} \in F_{2} if and only if

(q0,w,z)M1*(qr,λ,x)(q_{0}, w, z) \vdash^{*}_{M_{1}} (q_{r}, \lambda, x)

and

δ*(p0,w)=ps\delta^{*}(p_{0},w) = p_{s}.

Therefore, a string is accepted by Mˆ\widehat{M} if and only if it is accepted by both M1M_{1} and M2M_{2}. That is, the string is in L(M1)L(M2)=L1L2L(M_{1}) \cap L(M_{2}) = L_{1} \cap L_{2}.

QED.

8.2.4 Linz Example 8.7

Show that the language

L={anbn:n0,n100}L = \{ a^{n}b^{n} : n \geq 0, n \neq 100 \}

is context free.

We can construct an npda or context-free grammar for LL, but this is tedious. Instead, we use closure of regular intersection (Linz Theorem 8.5).

Let L1={a100b100}L_{1} = \{ a^{100}b^{100} \}.

L1L_{1} is finite, and thus also regular. Hence, L1\bar{L_{1}} is regular because regular languages are closed under complementation.

From previous results, we know that L={anbn:n0}L = \{ a^{n}b^{n} : n \geq 0 \} is context free.

Clearly, L={anbn:n0}L1L = \{ a^{n}b^{n} : n \geq 0 \} \cap \bar{L_{1}}.

By the closure of context-free languages under regular intersection, LL is a context-free language.

8.2.5 Linz Example 8.8

Show that

L={w{a,b,c}*:na(w)=nb(w)=nc(w)}L = \{ w \in \{a,b,c\}^{*} : n_{a}(w) = n_{b}(w) = n_{c}(w) \}

is not context free.

Although we could use the Pumping Lemma for Context-Free Languages, we again use closure of regular intersection (Linz Theorem 8.5).

Assume that LL is context free. Show that this leads to a contradiction.

Thus

LL(a*b*c*)={anbncn:n0}L \cap L( a^{*} b^{*} c^{*} ) = \{ a^{n} b^{n} c^{n} : n \geq 0 \}

is also context free. But we have previously proved that this language is not context free.

Thus we have a contradiction. Therefore, LL is not context free.

8.2.6 Some Decidable Properties of Context Free Languages

There exist algorithms for determine whether a context-free language is empty or nonempty and finite or infinite.

These algorithms process the context-free grammars for the languages. They assume that the grammars are first transformed using various algorithms from Linz Chapter 6 (which we did not cover in this course).

The algorithms from Chapter 6 include the removal of

Linz Theorem 8.6 (Determining Empty Context-Free Languages): Given a context-free grammar G=(V,T,S,P)G = (V,T,S,P), then there exists an algorithm for determining whether or not L(G)L(G) is empty.

Basic idea of algorithm: Assuming λL\lambda \notin L, remove the useless productions. If the start symbol is useless, then LL is empty. Otherwise, LL is nonempty.

Linz Theorem 8.7 (Determining Infinite Context-Free Languages): Given a context-free grammar G=(V,T,S,P)G = (V,T,S,P), then there exists an algorithm for determining whether or not L(G)L(G) is infinite.

Basic idea of algorithm: Remove useless symbols, λ\lambda-productions, and unit productions. If there are variables AA that repeat as in

A*xAyA \overset{*}{\Rightarrow} xAy

then the language is infinite. Otherwise, the language is finite. To determine repeated variables, we can build a graph of the dependencies of the variables on each other. If this graph has a cycle, then the variable at the base of the cycle is repeated.

Unfortunately, other simple properties are not as easy as the above.

For example, there is no algorithm to determine whether two context-free grammars generate the same language.