4 August 2018
Browser Advisory: The HTML version of this textbook requires a browser that supports the display of MathML. A good choice as of August 2018 is a recent version of Firefox from Mozilla.
One particular benefit of lazy evaluation is that functions in Haskell can manipulate “infinite” data structures. Of course, a program cannot actually generate or store all of an infinite object, but lazy evaluation will allow the object to be built piece-by-piece as needed and the storage occupied by no-longer-needed pieces to be reclaimed.
This chapter explores Haskell programming techniques for infinite data structures such as lists.
TODO: - Complete chapter. - Separate out code, make sure works for Haskell 2010.
In Chapter 18, we looked at generators for infinite arithmetic sequences such as [1..]
and [1,3..]
. These infinite lists are encoded in the functions that generate the sequences. The sequences are only evaluated as far as needed.
For example, take 5 [1..]
yields:
Haskell also allows infinite lists of infinite lists to be expressed as shown in the following example which generates a table of the multiples of the positive integers.
multiples :: [[Int]]
multiples = [ [ m*n | m<-[1..]] | n <- [1..] ]
Thus multiples
represents an infinite list, as shown below (not valid Haskell code):
However, if we evaluate the expression
we get the terminating result:
Note: Remember that the operator xs !! n
returns element n
of the list xs
(where the head is element 0
).
Haskell’s infinite lists are not the same as infinite sets or infinite sequences in mathematics. Infinite lists in Haskell correspond to infinite computations whereas infinite sets in mathematics are simply definitions.
In mathematics, set .
However, in Haskell, the expression
yields:
This is a computation that never returns a result. Often, we assign this computation the value 1:4:9:
(where , pronounced “bottom” represents an undefined expression).
But the expression
yields:
Reference: This section is based, in part, on section 7.1 of the Bird and Wadler textbook [Bird 1988] and parts of Wentworth’s tutorial [Wentworth 1990].
In mathematics, the notation denotes the function composed with itself times. Thus, , , , , .
A useful function is the function iterate
such that (not valid Haskell code):
The Haskell standard Prelude defines iterate
recursively as follows:
For example, suppose we need the set of all powers of the integers.
We can define a function powertables
would expand as follows (not valid Haskell code):
Using iterate
we can define powertables
compactly as follows:
As another example, suppose we want a function to extract the decimal digits of a positive integer. We can define digits
as follows:
Let’s consider how digits 178
evaluates (not actual reduction steps).
digits 178 |
|
reverse . map ( mod10) . takeWhile (/= 0) [178,17,1,0,0, ...] |
|
reverse . map ( mod10) [178,17,1] |
|
reverse [8,7,1] |
|
[1,7,8] |
Reference: This section is based in part on section 7.2 of the Bird and Wadler textbook [Bird 1988].
The Greek mathematician Eratosthenes described essentially the following procedure for generating the list of all prime numbers. This algorithm is called the Sieve of Eratosthenes.
Generate the list 2, 3, 4,
Mark the first element as prime.
Delete all multiples of from the list.
Return to step 2.
Not only is the 2-3-4 loop infinite, but so are steps 1 and 3 themselves.
There is a straightforward translation of this algorithm to Haskell.
primes :: [Int]
primes = map head (iterate sieve [2..])
sieve (p:xs) = [x | x <- xs, x `mod` p /= 0 ]
Note: This uses an intermediate infinite list of infinite lists; even though it is evaluated lazily, it is still inefficient.
We can use function primes
in various ways, e.g. to find the first 1000 primes or to find all the primes that are less than 10,000.
Calculations such as these are not trivial if the computation is attempted using arrays in an “eager” language like Pascal—in particular it is difficult to know beforehand how large an array to declare for the lists.
However, by separating the concerns, that is, by keeping the computation of the primes separate from the application of the boundary conditions, the program becomes quite modular. The same basic computation can support different boundary conditions in different contexts.
Now let’s transform the primes
and sieve
definitions to eliminate the infinite list of infinite lists. First, let’s separate the generation of the infinite list of positive integers from the application of sieve
.
Next, let’s try to transform rsieve
into a more efficient definition.
{ rsieve
}
{ iterate
}
{ map.2
, head
}
{ sieve
}
{ rsieve
}
This calculation gives us the new definition:
This new definition is, of course, equivalent to the original one, but it is slightly more efficient in that it does not use an infinite list of infinite lists.
Reference: This is based in part on section 7.3 of the Bird and Wadler textbook [Bird 1988] and Chapter 9 of Wentworth’s tutorial [Wentworth 1990].
Suppose a program produces a data structure (e.g. a list) as its output. And further suppose the program feeds that output structure back into the input so that later elements in the structure depend on earlier elements. These might be called circular, cyclic, or self-referential structures.
Consider a list consisting of the integer one repeated infinitely:
As an expression graph, ones
consists of a cons operator with two children, the integer 1
on the left and a recursive reference to ones
(i.e. a self loop) on the right. Thus the infinite list ones
is represented in a finite amount of space.
Function numsFrom
below is a perhaps more useful function. It generates a list of successive integers beginning with n
:
Using numsFrom
we can construct an infinite list of the natural number multiples of an integer m
:
Of course, we cannot actually process all the members of one of these infinite lists. If we want a terminating program, we can only process some finite initial segment of the list. For example, we might want all of the multiples of 3 that are at most 2000:
We can also define a program to generate a list of the Fibonacci numbers in a circular fashion similar to ones
:
Proofs involving infinite lists are beyond the current scope of this textbook. See the Bird and Wadler textbook for more information [Bird 1988]/.
Reference: This is based, in part, on section 7.6 of the Bird and Wadler textbook [Bird 1988] and Chaptrer 9 of Wentworth’s tutorial [Wentworth 1990].
TODO: Finish Chapter
TODO
TODO
In Summer 2018, I adapted and revised this chapter from:
These previous notes drew on the presentations in the 1st edition of the Bird and Wadler textbook [Bird 1988], Wentworth’s tutorial [Wentworth 1990], and other sources.
I incorporated this work as new Chapter 30, Infinite Data Structures, in the 2018 version of the textbook Exploring Languages with Interpreters and Functional Programming and continue to revise it.
I maintain this chapter as text in Pandoc’s dialect of Markdown using embedded LaTeX markup for the mathematical formulas and then translate the document to HTML, PDF, and other forms as needed.
Infinite data structures, lazy evaluation, infinite sets, infinite sequences, infinite lists, infinite computations, bottom , iterate
, prime numbers, Sieve of Eratosthenes, separation of concerns, circular/cyclic/self-referential structures.