and Functional Programming

Chapter 19

**12 October 2020**

Copyright (C) 2016, 2017, 2018, 2020, H. Conrad Cunningham

Professor of Computer and Information Science

University of Mississippi

211 Weir Hall

P.O. Box 1848

University, MS 38677

(662) 915-5358

Professor of Computer and Information Science

University of Mississippi

211 Weir Hall

P.O. Box 1848

University, MS 38677

(662) 915-5358

**Browser Advisory:** The HTML version of this textbook requires a browser that supports the display of MathML. A good choice as of October 2020 is a recent version of Firefox from Mozilla.

This chapter is incomplete!

TODO: Write missing sections

- Cleanup function generalization description
- Add eager evaluated version of
`merge4b`

, perhaps rename`coseq`

- Complete sequential file update example
- Update code and include link
`Conseq.hs`

In a previous chapter, we examined *families* of related functions to define generic, higher-order functions to capture the computational pattern for each family. In this chapter, we approach *function generalization* more systematically.

The systematic function generalization approach begins with a prototype member of the potential family. As in similar techniques for building object-oriented software frameworks, we apply *Scope-Commonality-Variability (SCV) analysis* [Coplien 1998] to the potential family represented by this prototype and produce four outputs.

*scope*: the boundaries of the family. That is, we identify what we should address and what we can ignore, what is in the family and what is not.*terminology*: the definitions of the specialized terms, or concepts, relevant to the family.*commonalities*: the aspects of the family that do not change from one member to another. We seek to reuses these among family members. We sometimes call these the*frozen spots*.*variabilities*: the aspects of the family that may vary from one member to another. We sometimes call these the*hot spots*.

In SCV analysis, we must seek to identify all the implicit assumptions about elements in the family. These implicit assumptions need to be made explicit in family’s design and implementation.

Once we have the above, we incrementally transform the prototype function for each of the hot spots.

A generalizing transformation may replace specific values or data types at a hot spot by parameters. We may make a type more abstract, perhaps making it polymorphic. Or we may break a type into several types if it plays potentially different roles.

Similarly, a generalizing transformation may replace fixed, specialized operations at a hot spot by abstract operations. We may make an abstract operation a higher-order parameter of the generalized function.

*Cosequential processing* concerns the coordinated processing of two ordered sequences to produce some result, often a third ordered sequence. Key requirements include:

Both input sequences must be ordered according to the same total ordering.

The processing should be incremental, where only a few elements of each sequence (perhaps just one) are examined at a time.

This important family includes the ascending merge needed in merge sort, set and bag operations, and sequential file update applications.

Consider a function `merge0`

that takes two ascending sequences of integers and merges them together to form a third ascending sequence.

```
merge0 :: [Int] -> [Int] -> [Int] -- xs, ys
= ys
merge0 [] ys = xs
merge0 xs [] @(x:xs') ys@(y:ys')
merge0 xs| x < y = x : merge0 xs' ys
| x == y = x : merge0 xs' ys'
| x > y = y : merge0 xs ys'
```

The `merge0`

function must satisfy a number of properties:

*Precondition*: The two input lists must be in ascending order.*Postcondition*: The output list must also be in ascending order. The number of times an element appears in the output list is the maximum number of times it appears within one of the two input lists.*Termination*: The sum of the lengths of the two input sequences must decrease by at least one for each call of the recursive function.

For the cosequential processing family, let take function `merge0`

as the prototype member.

Aside: The `merge0`

function differs from the merge function we used in merge sort in a previous chapter. For merge sort, the `x == y`

leg would need to remove the head of only one of the input lists, i.e. be defined as either `x : merge0 xs' ys`

or `x : merge0 xs ys'`

.

Considering the scope and examining the prototype function `merge0`

, we identify the following frozen spots for the family of functions:

The input consists of two sequences ordered by the same total ordering.

The output consists of a sequence ordered by the same total ordering as the input sequences.

The processing is incremental. Each step examines the current element from each input sequence and advances at least one of the input sequences for subsequent steps.

Each step compares the current elements from the input sequences to determine what action to take at that step.

The merge function represents the frozen spots of the family. It gives the common behavior of family members and the relationships among the various elements of the hot spot subsystems.

A *hot spot subsystem* consists of a set of Haskell functions, types, and class definitions that add the desired variability into the merge function.

Again considering the scope and examining the prototype function `merge0`

, we can identify the following hot spots:

Variability in the total ordering used for the input and output sequences, i.e. of the comparison operators and input sequence type.

The ability to have more complex data entities in the input and output sequences, i.e. variability in “record” format.

The ability to vary the input and output sequence structures independently of each other.

Variability in the transformations applied to the data as it passes into the output.

Variability in the sources of the input sequences and destination of the output sequence.

We need to be careful to avoid enumerating hot spots that are unlikely to be needed in an application.

Now let’s analyze each hot spot, design a hot spot subsystem, and carry out the appropriate transformations to generalize the Haskell program.

In the function `merge0`

, the input and output sequences are restricted to elements of type `Int`

and the comparison operations, hence, to the integer comparisons.

The responsibility associated with hot spot #1 is to enable the base type of the sequences to be any type upon which an appropriate ordering is defined. In this transformation, we still consider all three sequences as containing simple values of the same type.

We can generalize the function to take and return sequences of any ordered type by making the type of the list polymorphic. Using a type variable `a`

, we can redefine the type signature to be `[a]`

.

However, we need to constrain type `a`

to be a type for which an appropriate total ordering is defined. We do this by requiring that the type be restricted to those in the predefined Haskell type class `Ord`

. This class consists of the group of types for which all six relational operators are defined.

The function resulting from generalization step is `merge1`

.

```
merge1 :: Ord a => [a] -> [a] -> [a] -- xs, ys
= ys
merge1 [] ys = xs
merge1 xs [] @(x:xs') ys@(y:ys')
merge1 xs| x < y = x : merge1 xs' ys
| x == y = x : merge1 xs' ys'
| x > y = y : merge1 xs ys'
```

This function represents the frozen spots of the cosequential processing framework. The implementation of class `Ord`

used in a program is hot spot #1. To satisfy the requirement represented by frozen spot #1, we require that the two lists `xs`

and `ys`

be in ascending order.

Note that, if we restrict `merge1`

polymorphic type `a`

to `Int`

, then:

`== merge0 xs ys merge1 xs ys `

That is, the generic function `merge1`

can be *specialized* to be equivalent to `merge0`

.

The `merge1`

function works with sequences of any type that have appropriate comparison operators defined. This allows the elements to be of some built-in type such as `Int`

or `String`

or some user-defined type that has been declared as an instance of the `Ord`

class. Thus each individual data item is of a single type.

In general, however, applications in this family will need to work with data elements that have more complex structures. We refer to these more complex structures as *records* in the general sense, not just the Haskell data structure by that name.

The responsibility associated with hot spot #2 is to enable the elements of the sequences to be values with more complex structures, i.e. records. Each record is composed of one or more fields of which some subset defines the key. The value of the key provides the information for ordering the records within that sequence.

In this transformation, we still consider all three sequences as containing simple values of the same type. We abstract the `key`

as a function on the record type that returns a value of some `Ord`

type to enable the needed comparisons. We transform the `merge1`

function by adding `key`

as a higher-order parameter.

The function resulting from this generalization is `merge2`

.

```
merge2 :: Ord b => (a -> b) -> [a] -> [a] -> [a] -- key, xs, ys
= ys
merge2 key [] ys = xs
merge2 key xs [] @(x:xs') ys@(y:ys')
merge2 key xs| key x < key y = x : merge2 key xs' ys
| key x == key y = x : merge2 key xs' ys'
| key x > key y = y : merge2 key xs ys'
```

The higher-order parameter `key`

represents hot spot #2 in the generalized function design.

Hot spot #1 is the implementation of Haskell class `Ord`

for values of type `b`

.

To satisfy the requirement represented by frozen spot #1, the sequence of keys corresponding to each input sequence, i.e. `map key xs`

and `map key ys`

, must be in ascending order.

Also note that

`id xs ys == merge1 xs ys merge2 `

where `id`

is the identity function. Thus `merge1`

is a specialization of `merge2`

.

In `merge2`

, the records are of the same type in all three sequences. The key extraction function is also the same for all sequences.

Some cosequential processing applications, however, require that the record structure vary among the sequences. For example, the sequential file update application usually involves a master file and a transaction file as the inputs and a new master file as the output. The master records and transaction records usually carry different information.

The responsibility associated with hot spot #3 is to enable the three sequences to be varied independently. That is, the records in one sequence may differ in structure from the records in the others.

This requires separate key extraction functions for the two input sequences. These must, however, still return key values from the same total ordering. Because the data types for the two input sequences may differ and both may differ from the output data type, we must introduce record transformation functions that convert the input data types to the output types.

The function resulting from the transformation is `merge3`

.

```
merge3 :: Ord d => (a -> d) -> (b -> d) -- kx, ky
-> (a -> c) -> (b -> c) -- tx, ty
-> [a] -> [b] -> [c] -- xs, ys
= mg xs ys
merge3 kx ky tx ty xs ys where
= map ty ys
mg [] ys = map tx xs
mg xs [] @(x:xs') ys@(y:ys')
mg xs| kx x < ky y = tx x : mg xs' ys
| kx x == ky y = tx x : mg xs' ys'
| kx x > ky y = ty y : mg xs ys'
```

Higher-order parameters `kx`

and `ky`

are the *key* extraction functions for the first and second inputs, respectively. Similarly, `tx`

and `ty`

are the corresponding functions to *transform* those inputs to the output.

Hot spot #3 consists of these four functions. In some sense, this transformation subsumes hot spot #2.

To avoid repetition of the many unchanging arguments in the recursive calls, the definition of `merge3`

uses an auxiliary function definition `mg`

.

The nonrecursive legs use the higher-order library function `map`

. To satisfy the requirement represented by frozen spot #1, the sequence of keys corresponding to each input sequence, i.e. `map kx xs`

and `map ky ys`

, must be in ascending order.

If `xs`

and `ys`

are of the same type, then it is true that:

`id id xs ys == merge2 key xs ys merge3 key key `

Thus `merge2`

is a specialization of `merge3`

.

Function `merge3`

enabled simple one-to-one, record-by-record transformations of the input sequences to create the output sequence. Such simple transformations are not sufficient for practical situations.

For example, in the sequential file update application, each key may be associated with no more than one record in the master file. However, there may be any number of update transactions that must be performed against a master record before the new master record can be output. Thus, there needs to be some local state maintained throughout the processing of all the transaction records associated with one master record.

Before we address the issue of this variation directly, let us generalize the merge function to make the state that currently exists (i.e. the evolving output list) explicit in the parameter list.

To do this, we replace the backward linear recursive `merge3`

function by its *tail recursive* generalization. That is, we add an *accumulating parameter* `ss`

that is used to collects the output during the recursive calls and then to generate the final output when the end of an input sequence is reached.

The initial value of this argument is normally a nil list, but it does enable some other initial value to be prepended to the output list. This transformation is shown as function `merge4a`

below.

```
merge4a :: Ord d => (a -> d) -> (b -> d) -- kx, ky
-> (a -> c) -> (b -> c) -- tx, ty
-> [c] -> [a] -> [b] -> [c] -- ss, xs, ys
= mg ss xs ys
merge4a kx ky tx ty ss xs ys where
= ss ++ map ty ys
mg ss [] ys = ss ++ map tx xs
mg ss xs [] @(x:xs') ys@(y:ys')
mg ss xs| kx x < ky y = mg (ss ++ [tx x]) xs' ys
| kx x == ky y = mg (ss ++ [tx x]) xs' ys'
| kx x > ky y = mg (ss ++ [ty y]) xs ys'
```

Note that the following holds:

`== ss ++ merge3 kx ky tx ty xs ys merge4a kx ky tx ty ss xs ys `

Thus function `merge3`

is a specialization of `merge4a`

.

Unfortunately, building up the state `ss`

requires a relatively expensive appending to the end of a list (e.g. `ss ++ [tx x]`

in the third leg).

Now consider hot spot #4 more explicitly. The responsibility associated with the hot spot is to enable the use of more general transformations on the input sequences to produce the output sequence.

To accomplish this, we introduce an explicit *state* to record the relevant aspects of the computation to some position in the two input sequences. Each call of the merge function can examine the current values from the input sequences and update the value of the state appropriately for the next call.

In some sense, the merge function “folds” together the values from the two input sequences to compute the state. At the end of both input sequences, the merge function then transforms the state into the output sequence.

To accomplish this, we can generalize `merge4a`

. We generalize the accumulating parameter `ss`

in `merge4a`

to be a parameter `s`

that represents the state. We also replace the two simple record-to-record transformation functions `tx`

and `ty`

by more flexible transformation functions `tl`

, `te`

, and `tg`

, that update the state in the three guards of the recursive leg and functions `tty`

and `ttx`

that update the state when the first and second input sequences, respectively, become empty.

For the “equals” guard, the amount that the input sequences are advanced also becomes dependent upon the state of the computation. This is abstracted as functions `nex`

on the first input sequence and `ney`

on the second. To satisfy the requirement represented by frozen spot #3, the pair of functions `nex`

and `ney`

must make the following progress requirement true for each call of `mg`

:

```
if (kx x == ky y) then
length (nex s xs) < length xs) ||
(length (ney s ys) < length ys)
(else True
```

That is, the client of the framework must ensure that at least one of the input sequences will be advanced by at least one element. We also introduce the new function `res`

to take the final state of the computation and return the output sequence.

The above transformation results in function `merge4b`

.

```
merge4b :: Ord d => (a -> d) -> (b -> d) -> -- kx, ky
-> a -> b -> e) -> -- tl
(e -> a -> b -> e) -> -- te
(e -> a -> b -> e) -> -- tg
(e -> [a] -> [a]) -> -- nex
(e -> [b] -> [b]) -> -- ney
(e -> a -> e) -> -- ttx
(e -> b -> e) -> -- tty
(e -> [c]) -> e -> -- res, s
(e -> [b] -> [c] -- xs, ys
[a]
merge4b kx ky tl te tg nex ney ttx tty res s xs ys = mg s xs ys
where
= res (foldl tty s ys)
mg s [] ys = res (foldl ttx s xs)
mg s xs [] @(x:xs') ys@(y:ys')
mg s xs| kx x < ky y = mg (tl s x y) xs' ys
| kx x == ky y = mg (te s x y) (nex s xs) (ney s ys)
| kx x > ky y = mg (tg s x y) xs ys
```

The function uses the Prelude function `foldl`

in the first two legs. This function continues the computation beginning with the state computed by the recursive leg and processes the remainder of the nonempty input sequence by “folding” the remaining elements as defined in the functions `ttx`

and `tty`

.

As was the case for `merge3`

, frozen spot #1 requires that `map kx xs`

and `map ky ys`

be in ascending order for calls to `merge4b`

.

Hot spot #4 consists of the eight functions `tl`

, `te`

, `tg`

, `ttx`

, `tty`

, `nex`

, `ney`

, and `res`

. The following property also holds:

```
merge4b kx ky -> ss ++ [tx x]) -- tl
(\ss x y -> ss ++ [tx x]) -- te
(\ss x y -> ss ++ [ty y]) -- tg
(\ss x y -> tail xs) -- nex
(\ss xs -> tail ys) -- ney
(\ss ys -> ss ++ [x]) -- ttx
(\ss x -> ss ++ [y]) -- tty
(\ss y id ss xs ys -- res, ss, xs, ys
== merge4a kx ky tx ty ss xs ys
== ss ++ merge3 kx ky tx ty xs ys
```

That is, we can define the general transformation functions so that they have the same effect as the record-to-record transformations of `merge4a`

. The statement of this property uses the anonymous functions (lambda expression) feature of Haskell.

Thus function `merge3`

is a specialization of `merge4a`

, which in turn is a specialization of function `merge4b`

.

A problem with the above “implementation” of `merge3`

is that the `merge4b`

parameters `tl`

, `te`

, `tg`

, `ttx`

, and `tty`

all involve an expensive operation to append to the end of the list `ss`

.

An alternative would be to build the state sequence in reverse order and then reverse the result as shown below.

```
merge4b kx ky -> reverse (tx x) ++ ss) -- tl
(\ss x y -> reverse (tx x) ++ ss) -- tl
(\ss x y -> reverse (ty y) ++ ss) -- tg
(\ss x y -> tail xs) -- nex
(\ss xs -> tail ys) -- ney
(\ss ys -> x : ss) -- ttx
(\ss x -> y : ss) -- tty
(\ss y reverse ss xs ys -- res, ss, xs, ys
== merge4a kx ky tx ty ss xs ys
== ss ++ merge3 kx ky tx ty xs ys
```

TODO: Possibly include a version with selective eager evaluation (similar to below) and rename coseq.

```
merge4c :: Ord d => (a -> d) -> (b -> d) -> -- kx, ky
-> a -> b -> e) -> -- tl
(e -> a -> b -> e) -> -- te
(e -> a -> b -> e) -> -- tg
(e -> [a] -> [a]) -> -- nex
(e -> [b] -> [b]) -> -- ney
(e -> a -> e) -> -- ttx
(e -> b -> e) -> -- tty
(e -> [c]) -> e -> -- res, s
(e -> [b] -> [c] -- xs, ys
[a]
merge4b kx ky tl te tg nex ney ttx tty res s xs ys = mg s xs ys
where
= res (foldl tty s ys)
mg s [] ys = res (foldl ttx s xs)
mg s xs [] @(x:xs') ys@(y:ys')
mg s xs| kx x < ky y = (mg $! (tl s x y)) xs' ys
| kx x == ky y = (mg $! (te s x y)) ((nex $! s) xs)
$! s) ys)
((ney | kx x > ky y = (mg $! (tg s x y)) xs ys'
```

Hot spot #5 concerns the ability to take the input sequences from many possible sources and to direct the output to many possible destinations.

In the Haskell merge functions, these sequences are represented as the pervasive polymorphic list data type. The redirection is simply a matter of writing appropriate functions to produce the input lists and to consume its output list. No changes are needed to the `merge4b`

function itself.

Of course, for any expressions (e.g. function calls) `ex`

and `ey`

that generate the input sequence arguments `xs`

and `ys`

of `merge4b`

, it must be the case that sequences `map kx ex`

and `map ky ey`

are ascending.

TODO: Possibly reexpress some of the lambdas above with standard combinators.

Mathematically, a *bag* (also called a *multiset*) is an unordered collection of elements in which each element may occur one or more times. We can model a bag as a total function (called the *multiplicity function*) over the domain of elements to the natural numbers where the numbers 0 and above denote the number of occurrences of the element in the bag.

A *set* is thus a bag for which there is no more than one occurence of any element.

If we restrict the elements to a Haskell data type that is an instance of class `Ord`

, we can represent a bag by an ascending list of values and a set by an increasing list of values. With this representation, we can implement the bag an set operations as special cases of cosequential processing.

The *intersection* of two bags consists of only the elements that occur in both bags such that the number of occurrences is the minimum number for the two input bags. We can express the bag intersection of two ascending lists in terms of `merge4b`

as follows.

```
=
bagIntersect xs ys id id
merge4b -> s)
(\s x y -> x:s)
(\s x y -> s)
(\s x y -> tail xs)
(\s xs -> tail ys)
(\s ys -> s)
(\s x -> s)
(\s y reverse [] xs ys
```

This function only adds an element to the output when it occurs in both input lists.

If we require the two input lists to be increasing, the above also implements set intersection.

The *sum* of two bags consists of the elements that occur in either bag such that the number of occurrences is the total number for both bags. We can express the bag sum of two ascending lists in terms of `merge4b`

as follows.

```
=
bagSum xs ys id id
merge4b -> x:s)
(\s x y -> x:y:s)
(\s x y -> y:s)
(\s x y -> tail xs)
(\s xs -> tail ys)
(\s ys -> x:s)
(\s x -> y:s)
(\s y reverse [] xs ys
```

The *union* of two bags consists of the elements that occur in either bag such that the number of occurrences is the minimum number in the two input lists. The prototype function `merge0`

implements this operation on ascending lists.

The *subtraction* of bag `B`

from bag `A`

, denoted `A - B`

, consists of only the elements that occur in both bags such that the number of occurrences is the number of occurrences in `A`

minus those in `B`

.

Questions:

How can we represent set union in terms of

`merge4b`

?How can we represent a merge function that can be used in the merge sort of two lists (whose elements are from an instance of class

`Ord`

)?How can we implement a bag union function

`bagUnion`

in terms of`merge4b`

?How can we implement a bag subtraction function

`bagSub xs ys`

in terms of`merge4b`

?If the elements of the input lists are not instances of class

`Ord`

, how can we implement bag union? bag intersection?

TODO: Describe this!?

```
-- Simple Master-Transaction Update
-- Master increasing list [(Account,Amount)]
-- with Account < maxAccount
-- Transaction ascending list [(Account,Amount)]
-- with Account < maxAccount
-- Result is new Master increasing list [(Account,Amount)]
-- with Account < maxAccount
type Account = Int
type Amount = Integer
seqUpdate :: [(Account,Amount)] -> [(Account,Amount)]
-> [(Account,Amount)]
= merge4c fst fst
seqUpdate
masterlt mastereq mastergt
masternext transnext
notrans nomaster
getResult initState
= ([],[])
initState
= maxBound :: Account
maxAccount
= (m:out,[])
masterlt (out,[]) m t -- no transactions for this master
= (cur:out,[])
masterlt (out,[cur]) m t -- processed all transactions for this master
mastereq (out,[]) (ma,mb) (_,tc) = (out, [(ma,mb+tc)]) -- first transaction
mastereq (out,[(sa,sb)]) (_,_) (_,tc) = (out, [(sa,sb+tc)]) -- subsequent transaction
mastergt (_,_) m t= error ("Transactions not ascending at " ++ show t)
= ms -- do not advance master on eq
masternext (_,_) ms
= let ys = tail ts
transnext (_,_) ts in if null ys then [(maxAccount,0)] else ys
-- advance transaction on eq
-- force master with final (maxAccount,0)
= (m:out,[])
notrans (out,[]) m = (m:cur:out,[])
notrans (out,[cur]) m
-- only for empty master list
nomaster ([],[]) t = error ("Unmatched transaction " ++ show t)
-- transaction list ended
nomaster (out,[]) (maxInt,_) = (out,[])
nomaster _ t= error ("Unmatched transaction " ++ show t)
= reverse nms getResult (nms,[])
```

This case study illustrates the function generalization method. It begins with a simple Haskell program to merge two ascending lists of integers into third ascending list of integers. This program is generalized in a step by step fashion to produce a new program that is capable of carrying out any operation from the family of cosequential processing programs.

Although some members of the cosequential processing family can be rather complicated, the family has the characteristic that the primary driver for the algorithm can be concisely stated as a simple loop (i.e. recursive function).

TODO

TODO

In Summer 2016, I adapted and revised the discussion of function generalization from my and Pallavi Tadepalli’s paper “Using Function Generalization to Design a Cosequential Processing Framework” [Cunningham 2006a] to form partial Chapter 6, Developing Functional Programs, in the 2017 version of this textbook. (Paper [Cunningham 2006b] present similar work by me, Yi Liu, and Pallavi Tadepalli.)

In Summer 2018, I repurposed the partial Developing Functional Programs chapter and made it Chapter 19, Systematic Generalization, in the 2018 version of the textbook, now titled *Exploring Languages with Interpreters and Functional Programming*.

I maintain this chapter as text in Pandoc’s dialect of Markdown using embedded LaTeX markup for the mathematical formulas and then translate the document to HTML, PDF, and other forms as needed.

- [Coplien 1998]:
- J. Coplien, D. Hoffman, and D. Weiss. Commonality and Variability in Software Engineering,
*IEEE Software*, 15(6):37–45, November 1998. - [Cunningham 2006a]:
- H. Conrad Cunningham and Pallavi Tadepalli. Using Function Generalization to Design a Cosequential Processing Framework, In
*Proceedings of the 39th Hawaii International Conference on System Sciences*, January 2006. - [Cunningham 2006b]:
- H. Conrad Cunningham, Yi Liu, and P. Tadepalli. Framework Design using Function Generalization: A Binary Tree Traversal Case Study, In
*Proceedings of the ACM SouthEast Conference*, pp. 312-318, March 2006.

TODO