Copyright (C) 2015, H. Conrad Cunningham

Acknowledgements: MS student Eli Allen assisted in preparation of these notes. These lecture notes are for use with Chapter 11 of the textbook: Peter Linz. Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett Learning, 2012.The terminology and notation used in these notes are similar to those used in the Linz textbook.This document uses several figures from the Linz textbook.

Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of December 2015 seems to be a recent version of Firefox from Mozilla.

Introduction

The kinds of questions addressed in this chapter:

Note: We assume the languages in this chapter are λ\lambda-free unless otherwise stated.

11.1 Recursive and Recursively Enumerable Languages

Here we make a distinction between languages accepted by Turing machines and languages for which there is a membership algorithm.

11.1.1 Aside: Countability

Definition (Countable and Countably Infinite): A set is countable if it has the same cardinality as a subset of the natural numbers. A set is countably infinite if it can be placed into one-to-one correspondence with the set of all natural numbers.

Thus there is some ordering on any countable set.

Also note that, for any finite set of symbols Σ\Sigma, then Σ*\Sigma^{*} and any its subsets are countable. Similarly for Σ+\Sigma^{+}.

From Linz Section 10.4 (not covered in this course), we also have the following theorem about the set of Turing machines.

Linz Theorem 10.3 (Turing Machines are Countable): The set of all Turing machines is countably infinite.

11.1.2 Definition of Recursively Enumerable Language

Linz Definition 11.1 (Recursively Enumerable Language): A language LL is recursively enumerable if there exists a Turing machine that accepts it.

This definition implies there is a Turing machine MM such that for every wLw \in L

q0wM*x1qfx2q_{0}w \ \vdash^{*}_{M}\ x_{1}q_{f}x_{2}

with the initial state q0q_{0} and a final state qfq_{f}.

But what if wLw \notin L?

11.1.3 Definition of Recursive Language

Linz Definition 11.2 (Recursive Language): A language LL on Σ\Sigma is recursive if there exists a Turing machine MM that accepts LL and that halts on every ww in Σ*\Sigma^{*}.

That is, a language is recursive if and only if there exists a membership algorithm for it.

11.1.4 Enumeration Procedure for Recursive Languages

If a language is recursive, then there exists an enumeration procedure, that is, a method for counting and ordering the strings in the language.

Thus M^\hat{M} generates the candidate strings wiw_{i} in order. MM' writes the the accepted strings to its tape in order.

11.1.5 Enumeration Procedure for Recursively Enumerable Languages

Problem: A Turing machine MM might not halt on some strings.

Solution: Construct M^\hat{M} to advance "all" strings simultaneously, one move at a time. The order of string generation and moves is illustrated in Linz Figure 11.1.

Now machine M^\hat{M} advances each candidate string wiw_{i} (columns of Linz Figure 11.1) one MM-move at a time.

Linz Fig. 11.1: Enumeration Procedure for Recursively Enumerable Languages

Linz Fig. 11.1: Enumeration Procedure for Recursively Enumerable Languages

Because each string is generated by M^\hat{M} and accepted by MM in a finite number of steps, every string in LL is eventually produced by MM. The machine does not go into an infinite loop for a wiw_{i} that is not accepted.

Note: Turing machine M^\hat{M} does not terminate and strings for which MM does not halt will never complete processing, but any string that can be accepted by MM will be accepted within a finite number of steps.

11.1.6 Languages That are Not Recursively Enumerable

Linz Theorem 11.1 (Powerset of Countable Set not Countable) Let SS be an countably infinite set. Then its powerset 2S2^{S} is not countable.

Proof: Let S={s1,s2,s3,}S = \{\ s_{1}, s_{2}, s_{3}, \cdots\ \} be an countably infinite set.

Let t2St \in 2^{S}. Then tt can represented by a bit vector b1b2b_{1}b_{2}\cdots such that bi=1b_{i} = 1 if and only if sits_{i} \in t.

Assume 2S2^{S} is countable. Thus 2S2^{S} can be written in order t1,t2,t_{1}, t_{2}, \cdots and put into a table as shown in Linz Figure 11.2.

Linz Fig. 11.2: Cantor's Diagonalization

Linz Fig. 11.2: Cantor's Diagonalization

Consider the main diagonal of the table (circled in Linz Figure 11.2). Complement the bits along this diagonal and let tdt_{d} be a set represented by this bit vector.

Clearly td2St_{d} \in 2^{S}. But tdtit_{d} \neq t_{i} for any ii, because they differ at least at sis_{i}. This is a contradicts the assumption that 2S2^{S} is countable.

So the assumption is false. Therefore, 2S2^{S} is not countable. QED.

This is Cantor's diagonalization argument.

Linz Theorem 11.2 (Existence of Languages Not Recursively Enumerable): For any nonempty Σ\Sigma, there exist languages that are not recursively enumerable.

Proof: Any LΣ*L \subseteq \Sigma^{*} is a language on Σ\Sigma. Thus 2Σ*2^{\Sigma^{*}} is the set of all languages on Σ\Sigma.

Because Σ*\Sigma^{*} is infinite and countable, Linz Theorem 11.1 implies that the set of all languages on Σ\Sigma is not countable. From Linz Theorem 10.3 (see above), we know the set of Turing machines can be enumerated. Hence, the recursively enumerable languages are countable.

Therefore, some languages on Σ\Sigma are not recursively enumerable. QED.

11.1.7 A Language That is Not Recursively Enumerable

Linz Theorem 11.3: There exists a recursively enumerable language whose complement is not recursively enumerable.

Proof: Let Σ={a}\Sigma = \{ a \}.

Consider the set of all Turing machines with input alphabet Σ\Sigma, i.e., {M1,M2,M3,}\{ M_{1}, M_{2}, M_{3}, \cdots \}.

By Linz Theorem 10.3 (see above), we know that this set of is countable. So it has some order.

For each MiM_{i} there exists a recursively enumerable language L(Mi)L(M_{i}).

Also, for each recursively enumerable languages on Σ\Sigma, there is some Turing machine that accepts it.

Let L={ai:aiL(Mi)}L = \{ a^{i} : a^{i} \in L(M_{i}) \}.

LL is recursively enumerable because here is a Turing machine that accepts it. E.g., the Turing machine works as follows:

Now consider L={ai:aiL(Mi)}\bar{L} = \{ a^{i} : a^{i} \notin L(M_{i}) \}.

Assume L\bar{L} is recursively enumerable.

There must be some Turing machine MkM_{k}, for some kk, that accepts L\bar{L}. Hence, L=L(Mk)\bar{L} = L(M_{k}).

Consider aka^{k}. Is it in LL? Or in L\bar{L}?

Consider the case akLa^{k} \in \bar{L}. Thus akL(Mk)a^{k} \in L(M_{k}). Hence, akLa^{k} \in L by the definition of LL. This is a contradiction.

Consider the case akLa^{k} \in L, i.e., akLa^{k}\notin \bar{L}. Thus akL(Mk)a^{k} \notin L(M_{k}) by definition of L\bar{L}. But from the defintion of LL, akLa^{k} \in \bar{L}. This is also be a contradiction.

In all cases, we have a contradiction, so the assumption is false. Therefore, L\bar{L} is not recursively enumerable. QED.

11.1.8 A Language That is Recursively Enumerable but Not Recursive

Linz Theorem 11.4: If a language LL and its complement L\bar{L} are both recursively enumerable, then both languages are recursive. If LL is recursive, then L\bar{L} is also recursive, and consequently both are recursively enumerable.

Proof: See Linz Section 11.2 for the details.

Linz Theorem 11.5: There exists a recursively enumerable language that is not recursive; that is, the family of recursive languages is a proper subset of the family of recursively enumerable languages.

Proof: Consider the language LL of Linz Theorem 11.3.

This language is recursively enumerable, but its complement is not. Therefore, by Linz Theorem 11.4, it is not recursive, giving us the required example. QED.

There are well-defined languages that have no membership algorithms.

11.2 Unrestricted Grammars

Linz Definition 11.3 (Unrestricted Grammar): A grammar G=(V,T,S,P)G = (V, T, S, P) is an unrestricted gramar if all the productions are of the form

uvu \rightarrow v,

where uu is in (VT)+(V \cup T)^{+} and vv is in (VT)*(V \cup T)^{*}.

Note: There is no λ\lambda on left, but otherwise the use of symbols is unrestricted.

Linz Theorem 11.6 (Recursively Enumerable Language for Unrestricted Grammar): Any language generated by an unrestricted grammar is recursively enumerable.

Proof: See Linz Section 11.2 for the details.

The grammar defines an enumeration procedure for all strings.

Linz Theorem 11.7 (Unrestricted Grammars for Recursively Enumerable Language): For every recursively enumerable language LL, there exists an unrestricted grammar GG, such that L=L(G)L = L(G).

Proof: See Linz Section 11.2 for the details.

11.3 Context-Sensitive Grammars and Languages

Between the restricted context-free grammars and the unrestricted grammars, there are a number of kinds of "somewhat restricted" families of grammars.

Linz Definition 11.4 (Context-Sensitive Grammar): A grammar G=(V,T,S,P)G = (V, T, S, P) is said to be context-sensitive if all productions are of the form

xyx \rightarrow y,

where x,y(VT)+x, y \in (V \cup T)^{+} and

|x||y||x| \leq |y|.

This type of grammar is noncontracting in that the length of successive sentential forms can never decrease.

All such grammars can be rewritten in a normal form in which all productions are of the form

xAyxvyxAy \rightarrow xvy.

This is equivalent to saying that the production

AvA \rightarrow v

can only be applied in a context where AA occurs with string xx on the left and string yy on the right.

Linz Definition 11.5 (Context-Sensitive) : A language LL is said to be context-sensitive if there exists a context-sensitive grammar GG, such that L=L(G)L = L(G) or L=L(G){λ}L = L(G) \cup \{ \lambda \}.

Note the special cases for λ\lambda. This enables us to say that the family of context-free languages is a subset of the family of context-sensitive languages.

11.3.1 Linz Example 11.2

The language L={anbncn:n1}L = \{ a^{n}b^{n}c^{n} : n \geq 1 \} is a context-sensitive language. We show this by defining a context-sensitive grammar for the language, such as the following:

Sabc|aAbcS \rightarrow abc\ |\ aAbc
AbbAAb \rightarrow bA
AcBbccAc \rightarrow Bbcc
bBBbbB \rightarrow Bb
aBaa|aaAaB \rightarrow aa\ |\ aaA

Consider a derivation of a3b3c3a^{3}b^{3}c^{3}:

SS aAbcabAcabBbcc\Rightarrow aAbc \Rightarrow abAc \Rightarrow abBbcc
aBbbccaaAbbccaabAbcc\Rightarrow aBbbcc \Rightarrow aaAbbcc \Rightarrow aabAbcc
aabbAccaabbBbcccaabBbbccc\Rightarrow aabbAcc \Rightarrow aabbBbccc \Rightarrow aabBbbccc
aaabbbccc\Rightarrow aaabbbccc

The grammar uses the variables AA and BB as messengers.

The process is similar to how a Turing machine would work to accept the language LL.

LL is not context-free.

11.3.2 Linear Bounded Automata (lba)

In Linz Section 10.5 (not covered in this course), a linear-bounded automaton is defined as a nondeterministic Turing machine that is restricted to the part of its tape occupied by its input (bounded on the left by [[ and right by ]]).

[______][\_ \_ \_ \_ \_ \_].

Linz Theorem 11.8: For every context-sensitive language LL not including λ\lambda, there exists some linear bounded automaton MM such that L=L(M)L = L(M):

Proof: See Linz Section 11.3 for the details.

Linz Theorem 11.9: If a language LL is accepted by some linear bounded automaton MM, then there exists a context-sensitive grammar that generates LL.

Proof: See Linz Section 11.3 for the details.

11.3.3 Relation Between Recursive and Context-Sensitive Languages

Linz Theorem 11.10: Every context-sensitive language LL is recursive.

Linz Theorem 11.11: There exists a recursive language that is not context-sensitive.

We have studied a number of automata in this course. Ordered by decreasing power these include:

11.4 The Chomsky Hierarchy

We have studied a number of types of languages in this course, including

  1. recursively enumerable languages LREL_{RE}
  2. context-sensitive languages LCSL_{CS}
  3. context-free languages LREGL_{REG}
  4. regular languages LREGL_{REG}

One way of showing the relationship among these families of languages is to use the Chomsky hierarchy, where the types are numbered as above and as diagrams in Linz Figures 11.3 and 11.4.

This classification was first described in 1956 by American linguist Noam Chomsky, a founder of formal language theory.

Linz Fig 11.3: Original Chomsky Hierarchy

Linz Fig 11.3: Original Chomsky Hierarchy

Linz Fig 11.4: Extended Chomsky Hierarchy

Linz Fig 11.4: Extended Chomsky Hierarchy