Copyright (C) 2015, H. Conrad Cunningham
Acknowledgements: MS student Eli Allen assisted in preparation of these notes. These lecture notes are for use with Chapter 11 of the textbook: Peter Linz. Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett Learning, 2012.The terminology and notation used in these notes are similar to those used in the Linz textbook.This document uses several figures from the Linz textbook.
Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of December 2015 seems to be a recent version of Firefox from Mozilla.
The kinds of questions addressed in this chapter:
What is the family of languages accepted by Turing machines?
Are there any languages that are not accepted by any Turing machine?
What is the relationship between Turing machines and various kinds of grammars?
How can we classify the various families of languages and their relationships to one another?
Note: We assume the languages in this chapter are -free unless otherwise stated.
Here we make a distinction between languages accepted by Turing machines and languages for which there is a membership algorithm.
Definition (Countable and Countably Infinite): A set is countable if it has the same cardinality as a subset of the natural numbers. A set is countably infinite if it can be placed into one-to-one correspondence with the set of all natural numbers.
Thus there is some ordering on any countable set.
Also note that, for any finite set of symbols , then and any its subsets are countable. Similarly for .
From Linz Section 10.4 (not covered in this course), we also have the following theorem about the set of Turing machines.
Linz Theorem 10.3 (Turing Machines are Countable): The set of all Turing machines is countably infinite.
Linz Definition 11.1 (Recursively Enumerable Language): A language is recursively enumerable if there exists a Turing machine that accepts it.
This definition implies there is a Turing machine such that for every
with the initial state and a final state .
But what if ?
Linz Definition 11.2 (Recursive Language): A language on is recursive if there exists a Turing machine that accepts and that halts on every in .
That is, a language is recursive if and only if there exists a membership algorithm for it.
If a language is recursive, then there exists an enumeration procedure, that is, a method for counting and ordering the strings in the language.
Let be a Turing machine that determines membership in a recursive language on an alphabet .
Let be modified to write the accepted strings to its tape.
is countable, so there is some ordering of . Construct Turing machine that generates all in order, say .
Thus generates the candidate strings in order. writes the the accepted strings to its tape in order.
Problem: A Turing machine might not halt on some strings.
Solution: Construct to advance "all" strings simultaneously, one move at a time. The order of string generation and moves is illustrated in Linz Figure 11.1.
Now machine advances each candidate string (columns of Linz Figure 11.1) one -move at a time.
Because each string is generated by and accepted by in a finite number of steps, every string in is eventually produced by . The machine does not go into an infinite loop for a that is not accepted.
Note: Turing machine does not terminate and strings for which does not halt will never complete processing, but any string that can be accepted by will be accepted within a finite number of steps.
Linz Theorem 11.1 (Powerset of Countable Set not Countable) Let be an countably infinite set. Then its powerset is not countable.
Proof: Let be an countably infinite set.
Let . Then can represented by a bit vector such that if and only if .
Assume is countable. Thus can be written in order and put into a table as shown in Linz Figure 11.2.
Consider the main diagonal of the table (circled in Linz Figure 11.2). Complement the bits along this diagonal and let be a set represented by this bit vector.
Clearly . But for any , because they differ at least at . This is a contradicts the assumption that is countable.
So the assumption is false. Therefore, is not countable. QED.
This is Cantor's diagonalization argument.
Linz Theorem 11.2 (Existence of Languages Not Recursively Enumerable): For any nonempty , there exist languages that are not recursively enumerable.
Proof: Any is a language on . Thus is the set of all languages on .
Because is infinite and countable, Linz Theorem 11.1 implies that the set of all languages on is not countable. From Linz Theorem 10.3 (see above), we know the set of Turing machines can be enumerated. Hence, the recursively enumerable languages are countable.
Therefore, some languages on are not recursively enumerable. QED.
Linz Theorem 11.3: There exists a recursively enumerable language whose complement is not recursively enumerable.
Proof: Let .
Consider the set of all Turing machines with input alphabet , i.e., .
By Linz Theorem 10.3 (see above), we know that this set of is countable. So it has some order.
For each there exists a recursively enumerable language .
Also, for each recursively enumerable languages on , there is some Turing machine that accepts it.
Let .
is recursively enumerable because here is a Turing machine that accepts it. E.g., the Turing machine works as follows:
Now consider .
Assume is recursively enumerable.
There must be some Turing machine , for some , that accepts . Hence, .
Consider . Is it in ? Or in ?
Consider the case . Thus . Hence, by the definition of . This is a contradiction.
Consider the case , i.e., . Thus by definition of . But from the defintion of , . This is also be a contradiction.
In all cases, we have a contradiction, so the assumption is false. Therefore, is not recursively enumerable. QED.
Linz Theorem 11.4: If a language and its complement are both recursively enumerable, then both languages are recursive. If is recursive, then is also recursive, and consequently both are recursively enumerable.
Proof: See Linz Section 11.2 for the details.
Linz Theorem 11.5: There exists a recursively enumerable language that is not recursive; that is, the family of recursive languages is a proper subset of the family of recursively enumerable languages.
Proof: Consider the language of Linz Theorem 11.3.
This language is recursively enumerable, but its complement is not. Therefore, by Linz Theorem 11.4, it is not recursive, giving us the required example. QED.
There are well-defined languages that have no membership algorithms.
Linz Definition 11.3 (Unrestricted Grammar): A grammar is an unrestricted gramar if all the productions are of the form
,
where is in and is in .
Note: There is no on left, but otherwise the use of symbols is unrestricted.
Linz Theorem 11.6 (Recursively Enumerable Language for Unrestricted Grammar): Any language generated by an unrestricted grammar is recursively enumerable.
Proof: See Linz Section 11.2 for the details.
The grammar defines an enumeration procedure for all strings.
Linz Theorem 11.7 (Unrestricted Grammars for Recursively Enumerable Language): For every recursively enumerable language , there exists an unrestricted grammar , such that .
Proof: See Linz Section 11.2 for the details.
Between the restricted context-free grammars and the unrestricted grammars, there are a number of kinds of "somewhat restricted" families of grammars.
Linz Definition 11.4 (Context-Sensitive Grammar): A grammar is said to be context-sensitive if all productions are of the form
,
where and
.
This type of grammar is noncontracting in that the length of successive sentential forms can never decrease.
All such grammars can be rewritten in a normal form in which all productions are of the form
.
This is equivalent to saying that the production
can only be applied in a context where occurs with string on the left and string on the right.
Linz Definition 11.5 (Context-Sensitive) : A language is said to be context-sensitive if there exists a context-sensitive grammar , such that or .
Note the special cases for . This enables us to say that the family of context-free languages is a subset of the family of context-sensitive languages.
The language is a context-sensitive language. We show this by defining a context-sensitive grammar for the language, such as the following:
Consider a derivation of :
The grammar uses the variables and as messengers.
An is created on the left, travels to the right to the first , where it creates another and .
Messanger is sent back to the left to create the corresponding .
The process is similar to how a Turing machine would work to accept the language .
is not context-free.
In Linz Section 10.5 (not covered in this course), a linear-bounded automaton is defined as a nondeterministic Turing machine that is restricted to the part of its tape occupied by its input (bounded on the left by and right by ).
.
Linz Theorem 11.8: For every context-sensitive language not including , there exists some linear bounded automaton such that :
Proof: See Linz Section 11.3 for the details.
Linz Theorem 11.9: If a language is accepted by some linear bounded automaton , then there exists a context-sensitive grammar that generates .
Proof: See Linz Section 11.3 for the details.
Linz Theorem 11.10: Every context-sensitive language is recursive.
Linz Theorem 11.11: There exists a recursive language that is not context-sensitive.
We have studied a number of automata in this course. Ordered by decreasing power these include:
We have studied a number of types of languages in this course, including
One way of showing the relationship among these families of languages is to use the Chomsky hierarchy, where the types are numbered as above and as diagrams in Linz Figures 11.3 and 11.4.
This classification was first described in 1956 by American linguist Noam Chomsky, a founder of formal language theory.