Exploring Languages

with Interpreters

and Functional Programming

Chapter 27

**H. Conrad Cunningham**

**04 April 2022**

Copyright (C) 2018, 2022, H. Conrad Cunningham

Professor of Computer and Information Science

University of Mississippi

214 Weir Hall

P.O. Box 1848

University, MS 38677

(662) 915-7396 (dept. office)

Professor of Computer and Information Science

University of Mississippi

214 Weir Hall

P.O. Box 1848

University, MS 38677

(662) 915-7396 (dept. office)

**Browser Advisory:** The HTML version of this textbook
requires a browser that supports the display of MathML. A good choice as
of April 2022 is a recent version of Firefox from Mozilla.

Chapter 26 illustrates how to synthesize function definitions from their specifications.

This chapter (27) applies these program synthesis techniques to a larger set of examples on text processing.

In this section we develop a text processing package similar to the one in Section 4.3 of the Bird and Wadler textbook [3]. The text processing package in the Haskell standard Prelude is slightly different in its treatment of newline characters.

A textual document can be viewed in many different ways. At the lowest level, we can view it as just a character string and define a type synonym as follows:

`type Text = String `

However, for other purposes, we may want to consider the document as having more structure (i.e., view it as a sequence of words, lines, paragraphs, pages, etc). We sometimes want to convert the text from one view to another.

Consider the problem of converting a `Text`

document
to the corresponding sequence of lines. Suppose that in the
`Text`

{.haskell document, the newline characters
`'\n'`

{.haskell serve as *separators* of lines, not
themselves part of the lines. Because each line is a sequence of
characters, we define a type synonym `Line`

as
follows:

`type Line = String `

We want a function `lines'`

that will take a `Text`

document
and return the corresponding sequence of lines in the document. The
function has the type signature:

` lines' :: Text -> [Line]`

For example, the Haskell expression

`"This has\nthree\nlines" lines' `

yields:

`"This has", "three ", "lines"] [`

Writing function `lines'`

is not trivial. However, its inverse `unlines'`

is quite easy. Function
`unlines'`

takes a list of
`Line`

s,
inserts a newline character between each pair of adjacent lines, and
returns the `Text`

document
resulting from the concatenation.

` unlines' :: [Line] -> Text`

Let’s see if we can develop `lines'`

from `unlines'`

.

The basic computational pattern for function `unlines'`

is a folding operation.
Because we are dealing with the construction of a list and the list
constructors are nonstrict in their right arguments, a `foldr`

operation seems more appropriate than a `foldl`

operation.

To use `foldr`

, we need
a binary operation that will append two lines with a newline character
inserted between them. The following, a bit more general, operation
`insert'`

will do that for us.
The first argument is the element that is to be inserted between the two
list arguments.

```
insert' :: a -> [a] -> [a] -> [a]
= xs ++ [a] ++ ys -- insert.1 insert' a xs ys
```

Informally, it is easy to see that `(insert' a)`

is an associative
operation but that it has no right (or left) identity element.

Given that `(insert' a)`

has no identity element, there is no obvious “seed” value to use with
`fold`

. Thus we will need to find
a different way to express `unlines'`

.

If we restrict the domain of `unlines'`

to non-nil lists of
lines, then we can use `foldr1`

, a right-folding operation
defined over non-empty lists (in the Prelude). This function does not
require an identity element for the operation. Function `foldr1`

can be
defined as follows:

```
foldr1 :: (a -> a -> a) -> [a] -> a
foldr1 f [x] = x
foldr1 f (x:xs) = f x (foldr1 f xs)
```

Note: There is a similar function (in the Prelude), `foldl1`

that
takes a non-nil list and does a left-folding operation.

Thus we can now define `unlines'`

as follows:

```
unlines' :: [Line] -> Text
= foldr1 (insert' '\n') xss unlines' xss
```

Given the definition of `unlines'`

, we can now specify what
we want `lines'`

to do. It
must satisfy the following specification for any *non-nil* `xss`

of type `[Line]`

:

`= xss lines' (unlines' xss) `

That is, `lines'`

is the
inverse of `unlines'`

for all
non-nil arguments.

The first step in the synthesis of `lines'`

is to guess at a possible
structure for the `lines'`

function definition. Then we will attempt to calculate the unknown
pieces of the definition.

Because `unlines'`

uses a
right-folding operation, it is reasonable to guess that its inverse will
also use a right-folding operation. Thus we speculate that `lines'`

can be defined as follows,
given an appropriately defined operation `op`

and “seed value” `a`

.

```
lines' :: Text -> [Line]
= foldr op a lines'
```

Because of the definition of `foldr`

and type
signature of `lines'`

,
function `op`

must have the type
signature

` op :: Char -> [Line] -> [Line]`

and `a`

must be the right
identity of `op`

and hence have type `[Line]`

.

The task now is to find appropriate definitions for `op`

and `a`

.

From what we know about `unlines'`

, `foldr1`

, `lines'`

, and `foldr`

, we see
that the following identities hold. (These can be proved, but we do not
do so here.)

```
= xs -- unlines.1
unlines' [xs] ++xss) =
unlines' ([xs]'\n' xs (unlines' xss) -- unlines.2
insert'
= a -- lines.1
lines' [] ++xs) = op x (lines' xs) -- lines.2 lines' ([x]
```

Note the names we give each of the above identities (e.g., `unlines.1`

).
We use these equations to justify our steps in the calculations
below.

Next, let us calculate the unknown identity element `a`

. The strategy is to transform `a`

by use of the definition and derived
properties for `unlines'`

and
the specification and derived properties for `lines'`

until we arrive at a
constant.

` a`

$=$
`{`

`lines.1`

(right to left) `}`

` lines' []`

$=$
`{`

`unlines'.1`

(right to left) with `xs = []`

`}`

` lines' (unlines' [[]])`

$=$
`{`

specification of `lines'`

(left to right)
`}`

` [[]]`

Therefore we define `a`

to be `[[]]`

. Note that because of `lines.1`

,
we have also defined `lines'`

in the case where its argument is `[]`

.

Now we proceed to calculate a definition for `op`

. Remember that we assume `xss /= []`

.

As above, the strategy is to use what we know about `unlines'`

and what we have assumed
about `lines'`

to calculate
appropriate definitions for the unknown parts of the definition of `lines'`

. We first expand our
expression to bring in `unlines'`

.

` op x xss`

$=$
`{`

specification for `lines'`

(right to left)
`}`

` op x (lines' (unlines' xss))`

$=$
`{`

`lines.2`

(right to left) `}`

`++ unlines' xss) lines' ([x] `

Because there seems no other way to proceed with our calculation, we
distinguish between cases for the variable `x`

. In particular, we consider the case
where `x`

is the line separator
and the case where it is not, i.e., `x == '\n'`

and `x /= '\n'`

.

**Case x == '\n':**

Our strategy is to absorb the `'\n'`

into the `unlines'`

, then
apply the specification of `lines'`

.

`"\n" ++ unlines' xss) lines' (`

$=$
`{`

`[]`

is the identity for `++`

`}`

`++ "\n" ++ unlines' xss) lines' ([] `

$=$
`{`

`insert.1`

(right to left) with `a == '\n'`

`}`

`'\n' [] (unlines' xss)) lines' (insert' `

$=$
`{`

`unlines.2`

(right to left) `}`

`++ xss)) lines' (unlines' ([[]] `

$=$
`{`

specification of `lines'`

(left to right)
`}`

`++ xss [[]] `

Thus `op '\n' xss = [[]] ++ xss`

.

**Case x /= '\n':**

Our strategy is to absorb the `[x]`

into the `unlines'`

, then apply the
specification of `lines`

.

`++ unlines' xss) lines' ([x] `

$=$
`{`

Assumption `xss /= []`

, let
`xss = [ys] ++ yss`

`}`

`++ unlines' ([ys] ++ yss)) lines' ([x] `

$=$
`{`

`unlines.2`

(left to right) with `a = '\n'`

`}`

`++ insert' '\n' ys (unlines' yss)) lines' ([x] `

$=$
`{`

`insert.1`

(left to right) `}`

`++ (ys ++ "\n" ++ unlines' yss)) lines' ([x] `

$=$
`{`

`++`

associativity `}`

`++ ys) ++ "\n" ++ unlines' yss) lines' (([x] `

$=$
`{`

`insert.1`

(right to left) `}`

`'\n' ([x]++ys) (unlines' yss)) lines' (insert' `

$=$
`{`

`unlines.2`

(right to left) `}`

`++ys] ++ yss)) lines' (unlines' ([[x]`

$=$
`{`

specification of `lines'`

(left to right)
`}`

`++ys] ++ yss [[x]`

Thus, for `x /= '\n'`

and `xss /= []`

:

`= [[x] ++ head xss] ++ (tail xss) op x xss `

To generalize `op`

like we did
`insert'`

and give it a more
appropriate name, we define `op`

to be `breakOn '\n'`

as follows:

```
breakOn :: Eq a => a -> a -> [[a]] -> [[a]]
= error "breakOn applied to nil"
breakOn a x [] | a == x = [[]] ++ xss
breakOn a x xss | otherwise = [[x] ++ ys] ++ yss
where (ys:yss) = xss
```

Thus, we get the following definition for `lines'`

:

```
lines' :: Text -> [Line]
= foldr (breakOn '\n') [[]] xs lines' xs
```

Let’s review what we have done in this example. We have synthesized
`lines'`

from its
specification and the definition for `unlines'`

, its inverse. Starting
from a precise, but non-executable specification, and using only
equational reasoning, we have derived an executable definition of the
required function.

The technique used is a familiar one in many areas of mathematics:

We guessed at a form for the solution.

We then calculated the unknowns.

Note: The definition of `lines`

and
`unlines`

in the standard Prelude treat newlines as line *terminators*
instead of line separators. Their definitions follow.

```
lines :: String -> [String]
lines "" = []
lines s = l : (if null s' then [] else lines (tail s'))
where (l, s') = break ('\n'==) s
unlines :: [String] -> String
unlines = concat . map (\l -> l ++ "\n")
```

Let’s continue the text processing example from the previous subsection a bit further. We want to synthesize a function to break a text into a sequence of words.

For the purposes here, we define a *word* as any nonempty
sequence of characters not containing a space or newline character. That
is, a group of one or more spaces and newlines separate words. We
introduce a type synonym for words.

`type Word = String `

We want a function `words'`

that breaks a line up into a sequence of words. Function `words'`

thus has the following type
signature:

` words' :: Line -> [Word]`

For example, expression

`"Hi there" words' `

yields:

`"Hi", "there"] [`

As in the synthesis of `lines'`

, we proceed by defining the
“inverse” function first, then we calculate the definition for `words'`

.

All `unwords'`

needs to do
is to insert a space character between adjacent elements of the sequence
of words and return the concatenated result. Following the development
in the previous subsection, we can thus define `unwords'`

as follows.

```
unwords' :: [Word] -> Line
= foldr1 (insert' ' ') xs unwords' xs
```

Using calculations similar to those for `lines'`

, we derive the inverse of
`unwords'`

to be the following function:

`foldr (breakOn' ') [[]] `

However, this identifies zero-length words where there are adjacent spaces. We need to filter those out.

```
words' :: Line -> [Word]
= filter (/= []) . foldr (breakOn' ') [[]] words'
```

Note that

`= xss words' (unwords' xss) `

for all `xss`

of type `[Word]`

, but
that

`= xs unwords' (words' xs) `

for some `xs`

of type `Line`

. The
latter is undefined when `words' {.haskell} xs`

returns `[]`

. Where it is defined, adjacent
spaces in `xs`

are replaced by a
single space in `unwords' (words' xs)`

.

Note: The functions `words`

and
`unwords`

in the standard Prelude differ in that `unwords [] = []`

,
which is more complete.

Let’s continue the text processing example one step further and synthesize a function to break a sequence of lines into paragraphs.

For the purposes here, we define a *paragraph* as any nonempty
sequence of nonempty lines. That is, a group of one or more empty lines
separate paragraphs. As above, we introduce an appropriate type
synonym:

`type Para = [Line] `

We want a function `paras'`

that breaks a sequence of lines into a sequence of paragraphs:

` paras' :: [Line] -> [Para]`

For example, expression

`"Line 1.1","Line 1.2","","Line 2.1"] paras' [`

yields:

`"Line 1.1","Line 1.2"],["Line 2.1"]] [[`

As in the synthesis of `lines'`

and `words'`

, we can start with the
inverse and calculate the definition of `paras'`

. The inverse function `unparas'`

takes a sequence of
paragraphs and returns the corresponding sequence of lines with an empty
line inserted between adjacent paragraphs.

```
unparas' :: [Para] -> [Line]
= foldr1 (insert' []) unparas'
```

Using calculations similar to those for `lines'`

and `words'`

, we can derive the
following definitions:

```
paras' :: [Line] -> [Para]
= filter (/= []) . foldr (breakOn []) [[]] paras'
```

The `filter (/= [])`

operation removes all “empty paragraphs” corresponding to two or more
adjacent empty lines.

Note: There are no equivalents of `paras'`

and ``unparas'`

in the standard prelude.
As with `unwords`

, `unparas'`

should be redefined so
that `unparas' [] = []`

,
which is more complete.

Using the six functions in our text processing package, we can build other useful functions.

**Count the lines in a text.**`countLines :: Text -> Int = length . lines' countLines`

**Count the words in a text.**`countWords :: Text -> Int = length . concat . (map words') . lines' countWords`

An alternative using a list comprehension is:

`= countWords xs length [ w | l <- lines' xs, w <- words' l]`

**Count the paragraphs in a text.**`countParas :: Text -> Int = length . paras' . lines' countParas`

**Normalize text by removing redundant empty lines and spaces.**The following functions take advantage of the fact that

`paras'`

and`words'`

discard empty paragraphs and words, respectively.`normalize :: Text -> Text = unparse . parse normalize parse :: Text -> [[[Word]]] = (map (map words')) . paras' . lines' parse unparse :: [[[Word]]] -> Text = unlines' . unparas' . map (map unwords') unparse`

We can also state

`parse`

and`unparse`

in terms of list comprehensions.`= parse xs | l <- p] | p <- paras' (lines' xs) ] [ [words' l = unparse xssss | l<-p] | p<-xssss]) unlines' (unparas' [ [unwords' l`

Section 4.3.5 of the Bird and Wadler textbook [3] goes on to build functions to fill and left-justify lines of text.

Chapter 26 illustrates how to synthesize (i.e., derive or calculate) function definitions from their specifications. This chapter (27) applies these program synthesis techniques to larger set of examples on text processing.

No subsequent chapter depends explicitly upon the program synthesis content from these chapters. However, if practiced regularly, the techniques explored in this chapter can enhance a programmer’s ability to solve problems and construct correct functional programming solutions.

TODO

In Summer 2018, I adapted and revised this chapter and the next from
Chapter 12 of my *Notes on Functional Programming with Haskell*
[9].

These previous notes drew on the presentations in the first edition of the classic Bird and Wadler textbook [3] and other functional programming sources [1,2,15,17,18]. They were also influenced by my research, study, and teaching related to program specification, verification, derivation, and semantics [[4]; [5]; [6]; [7]; [8]; [10]; [11]; [12]; [13]; [14]; [16]; vanGesteren1990].

I incorporated this work as new Chapter 26, Program Synthesis, and
new Chapter 27, Text Processing (this chapter), in the 2018 version of
the textbook *Exploring Languages with Interpreters and Functional
Programming* and continue to revise it.

I retired from the full-time faculty in May 2019. As one of my post-retirement projects, I am continuing work on this textbook. In January 2022, I began refining the existing content, integrating additional separately developed materials, reformatting the document (e.g., using CSS), constructing a bibliography (e.g., using citeproc), and improving the build workflow and use of Pandoc.

I maintain this chapter as text in Pandoc’s dialect of Markdown using embedded LaTeX markup for the mathematical formulas and then translate the document to HTML, PDF, and other forms as needed.

Program synthesis, synthesizing a function from its inverse, text processing, line, word, paragraph, terminator, separator.

[1]

Richard Bird. 1998. *Introduction to
functional programming using Haskell* (Second ed.).
Prentice Hall, Englewood Cliffs, New Jersey, USA.

[2]

Richard Bird. 2015. *Thinking functionall
with Haskell* (First ed.). Cambridge University Press,
Cambridge, UK.

[3]

Richard Bird and Philip Wadler. 1988.
*Introduction to functional programming* (First ed.). Prentice
Hall, Englewood Cliffs, New Jersey, USA.

[4]

K.
Mani Chandy and Jayadev Misra. 1988. *Parallel program design: A
foundation*. Addison-Wesley, Boston, Massachusetts, USA.

[5]

Edward Cohen. 1990. *Programming in the
1990’s: An introduction to the calculation of programs*. Springer,
New York, New York, USA.

[6]

H.
Conrad Cunningham. 1989. The shared dataspace approach to concurrent
computation: The Swarm programming model, notation, and
logic. PhD thesis. Washington University, Department of Computer
Science, St. Louis, Missouri, USA.

[7]

H.
Conrad Cunningham. 2006. *A programmer’s introduction to predicate
logic*. University of Mississippi, Department of Computer
and Information Science, University, Mississippi, USA.
Retrieved from https://john.cs.olemiss.edu/~hcc/csci450/notes/haskell_notes.pdf

[8]

H.
Conrad Cunningham. 2006. *Notes on program semantics and
derivation*. University of Mississippi, Department of Computer
and Information Science, University, Mississippi, USA.
Retrieved from https://john.cs.olemiss.edu/~hcc/reports/umcis-1994-02.pdf

[9]

H.
Conrad Cunningham. 2014. *Notes on functional programming with
Haskell*. University of Mississippi, Department of
Computer and Information Science, University, Mississippi,
USA. Retrieved from https://john.cs.olemiss.edu/~hcc/csci450/notes/haskell_notes.pdf

[10]

Edsger W. Dijkstra. 1976. Updating a sequential
file. In *A discipline of programming*. Prentice Hall, Englewood
Cliffs, New Jersey, USA, 117--122.

[11]

Edsger W. Dijkstra and Wim H. J. Feijen. 1988.
*A method of programming*. Addison-Wesley, TBD.

[12]

Edsger W. Dijkstra and Carel S. Scholten. 1990.
*Predicate calculus and program semantics*. Springer, New York,
New York, USA.

[13]

David Gries. 1981. *Science of
programming*. Springer, New York, New York, USA.

[14]

David Gries and Fred B. Schneider. 2013. *A
logical approach to discrete math*. Springer, New York, New York,
USA.

[15]

Robert R. Hoogerwoord. 1989. The design of
functional programs: A calculational approach. PhD thesis. Eindhoven
Technical University, Eindhoven, The Netherlands.

[16]

Anne Kaldewaij. 1990. *Programming: The
derivation of algorithms*. Prentice Hall, New York, New York,
USA.

[17]

Simon Thompson. 1996. *Haskell: The craft of
programming* (First ed.). Addison-Wesley, Boston, Massachusetts,
USA.

[18]

E.
Peter Wentworth. 1990. *Introduction to functional programming using
RUFL*. Rhodes University, Department of Computer
Science, Grahamstown, South Africa.