Exploring Languages
with Interpreters
and Functional Programming
Chapter 22
H. Conrad Cunningham
21 April 2022
Browser Advisory: The HTML version of this textbook requires a browser that supports the display of MathML. A good choice as of April 2022 is a recent version of Firefox from Mozilla.
This chapter (22) revisits the specification, design, and implementation of data abstraction modules in Haskell. It follows the general approach introduced in Chapter 7 but uses algebraic data types introduced in Chapter 21 to represent the data. An algebraic data enables the Haskell module implementing the abstraction to encapsulate the details of the data structure.
The goals of this chapter are to:
reinforce the methods for specification and design of data abstractions
illustrate how to use Haskell modules and algebraic data types to enforce the encapsulation of a module’s implementation secrets
introduce additional concepts and terminology for data abstractions
The concepts and terminology in this chapter are mostly general. They are applicable to most any language. Here we look specifically at Haskell and focus on the details of one application. (I have implemented basically the same data abstraction module in Scala and Elixir.)
Chapter 7 used the term data abstraction.
This chapter uses the related term abstract data type [4] to refer to a data abstraction encapsulated in an information-hiding module. The data abstraction module defines and exports a user-defined type (i.e., an algebraic data type) and a set of operations (i.e., functions) on that type. The type is abstract in the sense that its concrete representation is hidden; only the module’s operations may manipulate the representation directly.
For convenience, this chapter sometimes uses acronym ADT to refer to an abstract data type.
In Chapters 6 and 7, we explored the concepts of contracts, which include preconditions and postconditions for the functions in the module and interface and implementation) invariants for the data created and manipulated by the module. For convenience, this chapter refers to these as the abstract model for the ADT.
In this chapter, we develop a family of doubly labelled digraph data structures.
As a graph, the data structure consists of a finite set of vertices (nodes) and a set of edges. Each edge connects two vertices. (Some writers require that the set of vertices be nonempty, but here we prefer to allow an empty graph to have no vertices. But the question remains whether such a graph with no vertices is pointless concept.)
As a directed graph (or digraph), each pair of vertices has at most one edge connecting them; the edge has a direction from one of the edges to the other.
As a doubly labelled graph, each vertex and each edge has some user-defined data (i.e., labels) attached.
This chapter draws on the discussion of digraphs and their specification in Chapters 1 and 10 of the Dale and Walker book Abstract Data Types [4].
For what purpose can we use a doubly labelled digraph data structure?
One concrete use case is to represent the game world in an implementation of an adventure game.
For example, in the Wizard’s Adventure Game from Chapter 5 of Land of Lisp: Learn to Program in Lisp, One Game at a Time [1], the game’s rooms become vertices, passages between rooms become edges, and descriptions associated with rooms or passages become labels on the associated vertex or edge (as shown in Figure 22.1).
Aside: By using a digraph to model the game world, we disallow multiple passages directly from one room to another. By changing the graph to a multigraph, we can allow multiple directed edges from one vertex to another.
The Adventure game must create and populate the game world initially, but it does not typically modify the game world during play. It maintains the game state (e.g., player location) separately from the game world. A player moves from room to room during play; the labelled digraph provides the static structure and descriptions of the game world.
How can we define an abstract data type?
The behavior of an ADT is defined by a set of operations that can be applied to an instance of the ADT.
Each operation of an ADT can have inputs (i.e., parameters) and outputs (i.e., results). The collection of information about the names of the operations and their inputs and outputs is the interface of the ADT.
To specify an ADT, we need to give:
the name of the ADT
the sets (or domains) upon which the ADT is built
These include the type being defined and the auxiliary types (e.g., primitive data types and other ADTs) used as parameters or return values of the operations.
the signatures (syntax or structure) of the operations
the semantics (or meaning) of the operations
Note: In this chapter, we more state the specification of the data abstraction more systematically than in Chapter 7. But we are doing essentially the same things we did for the Rational Arithmetic modules in Chapter 7.
We categorize an ADT’s operations into four groups depending upon their functionality:
A constructor (sometimes called a creator, factory, or producer function) constructs and initializes an instance of the ADT.
A mutator (sometimes called a modifier, command, or setter function) returns the instance with its state changed.
An accessor (sometimes called an observer, query, or getter function) returns information from the state of an instance without changing the state.
A destructor destroys an instance of the ADT.
We normally list the operations in that order.
For a language with immutable data structures like Haskell, a mutator returns a distinct new instance of the ADT with a state that is a modified version of the original instance’s state. That is, we are taking an applicative (or functional or referentially transparent) approach to ADT specifications.
Note: Of course, in an imperative language, a mutator can change the state of an instance in place. That may be more efficient, but it tends to be less safe. It also tends to make concurrent use of an abstract data type more problematic.
Technically speaking, a destructor is not an operation of the ADT. We can represent the other types of operations as functions on the sets in the specification. However, we cannot define a destructor in that way. But destructors are of pragmatic importance in the implementation of ADTs, particularly in languages that do not have automatic storage reclamation (i.e., garbage collection).
There are two primary approaches for specifying the semantics of the operations:
The axiomatic (or algebraic) approach gives a set of logical rules (properties or axioms) that relate the operations to one another. The meanings of the operations are defined implicitly in terms of each other.
The constructive (or abstract model) approach describes the meaning of the operations explicitly in terms of operations on other abstract data types. The underlying model may be any well-defined mathematical model or a previously defined ADT.
In some ways, the axiomatic approach is the more elegant of the two approaches. It is based in the well-established mathematical fields of abstract algebra and category theory. Furthermore, it defines the new ADT independently of other ADTs. To understand the definition of the new ADT it is only necessary to understand its axioms, not the semantics of a model.
However, in practice, the axiomatic approach to specification becomes very difficult to apply in complex situations. The constructive approach, which builds a new ADT from existing ADTs, is the more useful methodology for most practical software development situations.
In this chapter, we use the constructive approach.
Now let’s look at a constructive specification of the doubly labelled digraph.
First, we specify the ADT as an implementation-independent abstraction. The secret of the ADT module is the data structure used internally to implement the doubly labelled digraph.
Then, we examine two implementations of the abstraction:
using Haskell lists to represent the vertex and edge sets
using a Haskell Map
to map a
vertex to the set of outgoing edges from that vertex
Before we specify the ADT, let’s define the mathematical notation we use. We choose notation that can readily be used in comments in program.
We use the the plain-text specification notation to describe the abstract data type’s model and its semantics. The following document summarizes this notation:
TODO:
We name the abstract data type being defined to be Digraph
.
We specify that this abstract data type be represented by a Haskell
algebraic data type Digraph a b c
,
which has three type parameters (i.e., sets):
VertexType
,
the set of possible vertices (i.e., vertex identifiers) in the Digraph
VertexLabelType
,
the set of possible labels on vertices in the Digraph
EdgeLabelType
,
the set of possible labels on edges in the Digraph
Given this ADT defines a digraph, edges can be identified by ordered pairs (tuples) of vertices. Values from the above types, in particular the labels, may have several components.
We define the following operations on the Labelled Digraph ADT (shown below as Haskell function signatures).
Given the primary use case described above, we specify a constructor
to create an empty graph (new_graph
), a mutator to add a
new vertex (add_vertex
), and mutator to add a new edge
between existing vertices (add_edge
).
We also specify mutators to remove vertices
(remove_vertex
) and edges (remove_edge
) and to
update the labels on vertices (update_vertex
) and edges
(update_edge
). (Note: In the identified use case, these are
likely used less often than the mutators that add new vertices and
edges.)
We specify a single constructor with the following signature:
new_graph :: Digraph a b c
We specify six mutators with the following signatures:
add_vertex :: Digraph a b c -> a -> b -> Digraph a b c
remove_vertex :: Digraph a b c -> a -> Digraph a b c
update_vertex :: Digraph a b c -> a -> b -> Digraph a b c
add_edge :: Digraph a b c -> a -> a -> c -> Digraph a b c
remove_edge :: Digraph a b c -> a -> a -> Digraph a b c
update_edge :: Digraph a b c -> a -> a -> c -> Digraph a b c
We specify query functions to check whether the labelled digraph is
empty (is_empty
), has a given vertex
(has_vertex
), and has an edge between two vertices
(has_edge
).
We also specify accessors to retrieve the label associated with a
given vertex (get_vertex
) and edge
(get_edge
).
Given the identified use case, we also specify accessors to return
lists of all vertices in the graph (all_vertices
) and of
just their labels (all_vertices_labels
) and to return lists
of all outgoing edges from a vertex (from_edges
) and of
just their labels (from_edges_labels
).
We thus specify nine accessors with the following signatures:
is_empty :: Digraph a b c -> Bool
get_vertex :: Digraph a b c -> a -> b
has_vertex :: Digraph a b c -> a -> Bool
get_edge :: Digraph a b c -> a -> a -> c
has_edge :: Digraph a b c -> a -> a -> Bool
all_vertices :: Digraph a b c -> [a]
from_edges :: Digraph a b c -> a -> [a]
all_vertices_labels :: Digraph a b c -> [(a,b)]
from_edges_labels :: Digraph a b c -> a -> [(a,c)]
TODO: Consider changing get_vertex
and
get_edge
to return Maybe b
and
Maybe c
, respectively.
Given the identified use case and that Haskell uses garbage collection, no destructor seems to be needed in most cases.
We model the state of the instance of the Labelled Digraph
ADT with an abstract value G
such that
G = (V,E,VL,EL)
with G
’s components satisfying
the following Labelled Digraph Properties.
V
is a finite subset of values from the set VertexType
.
V
denotes the vertices (or nodes) of the digraph.
Any two elements of V
can be compared for
equality.
E
is a binary relation on the set V
. A
pair (v1,v2) IN E
denotes that there is a directed edge
from v1
to v2
in the digraph.
Note that this model allows at most one (directed) edge from a vertex
v1
to vertex v2
. It allows a directed edge
from a vertex to itself.
Also, because vertices can be compared for equality, any two edges can also be compared for equality.
VL
is a total function from set V
to
the set VertexLabelType
.
EL
is a total function from set E
to
the set EdgeLabelType
.
We define the following interface invariant for the Labelled Digraph ADT:
Any valid labelled digraph instance
G
, appearing in either the arguments or return value of a public ADT operation, must satisfy the Labelled Digraph Properties.
We specify the various ADT operations below using their type signatures, preconditions, and postconditions. Along with the interface invariant, these comprise the (implementation-independent) specification of the ADT (i.e., its abstract interface).
In these assertions, for a digraph g
that satisfies the
invariants, G(g)
denotes its abstract
model(V,E,VL,EL)
as described above. The value
Result
denotes the return value of function.
TODO: Consider in what order these should appear.
Constructor new_graph
creates and returns a new empty instance of the graph ADT.
Precondition:
True
Postcondition:
G(Result) == ({},{},{},{})
Accessor is_empty g
returns True
if and
only if graph g
is empty.
Precondition:
G(g) = (V,E,VL,EL)
Postcondition:
Result == (V == {} && E == {})
Mutator add_vertex g nv nl
inserts vertex
nv
with label nl
into graph g
and returns the resulting graph.
Precondition:
G(g) = (V,E,VL,EL) && nv NOT_IN V
Postcondition:
G(Result) == (V UNION {nv}, E, VL UNION {(nv,nl)}, EL)
Mutator remove_vertex g ov
deletes vertex
ov
from graph g
and returns the resulting graph.
Precondition:
G(g) = (V,E,VL,EL) && ov IN V
Postcondition:
G(Result) == (V', E', VL', EL')
where V' = V - {ov}
E' = E - {(ov,*),(*,ov)}
VL' = VL - {(ov,*)}
EL' = EL - {((ov,*),*),((*,ov),*)}
Mutator update_vertex g ov nl
changes the
label on vertex ov
in graph
g
to be nl
and returns the resulting
graph.
Precondition:
G(g) = (V,E,VL,EL) && ov IN V
Postcondition:
G(Result) == (V - {ov}, E, VL', EL)
where VL' = (VL - {(ov,VL(ov))}) UNION {(ov,nl)}
Accessor get_vertex g ov
returns the label from vertex ov
in graph g
TODO: If signature changed to return Maybe
, change
precondition and postcondition appropriately.
Precondition:
G(g) = (V,E,VL,EL) && ov IN V
Postcondition:
Result == VL(ov)
Accessor has_vertex g ov
returns True
if and
only if ov
is a vertex of graph
g
.
Precondition:
G(g) = (V,E,VL,EL)
Postcondition:
G(Result) == ov IN V
Mutator add_edge g v1 v2 nl
inserts an edge
from vertex v1
to vertex v2
in graph g
and returns the resulting graph.
Precondition:
G(g) = (V,E,VL,EL) && v1 IN V && v2 IN V &&
(v1,v2) NOT_IN E
Postcondition:
G(Result) == (V, E', VL, EL')
where E' = E UNION {(v1,v2)}
EL' = EL UNION {((v1,v2),nl)}
Mutator remove_edge g v1 v2
deletes the edge
from vertex v1
to vertex v2
from graph g
and returns the resulting graph.
Precondition:
G(g) = (V,E,VL,EL) V - {ov} && (v1,v2) IN E
Postcondition:
G(Result) == (V, E - {(v1,v2)}, VL, EL - { ((v1,v2),*) }
Mutator update_edge g v1 v2 nl
changes the
label on the edge from vertex v1
to vertex v2
in graph g
to have label nl
and
returns the resulting graph.
Precondition:
G(g) = (V,E,VL,EL) && (v1,v2) IN E
Postcondition:
G(Result) == (V, E, VL, EL')
where EL' == (EL - {((v1,v2),*)}) UNION {((v2,v2),nl)
Accessor get_edge g v1 v2
returns the label on the edge from vertex v1
to vertex v2
in graph g
.
TODO: If signature changed to return Maybe
, change
precondition and postcondition appropriately.
Precondition:
G(g) = (V,E,VL,EL) && (v1,v2) IN E
Postcondition:
Result == EL((v1,v2))
Accessor has_edge g v1 v2
returns True
if and
only if there is an edge from a vertex v1
to a vertex v2
in graph g
.
Precondition:
G(g) = (V,E,VL,EL)
Postcondition:
Result == (v1,v2) IN E
Accessor all_vertices g
returns a sequence of all the vertices in graph g
. The returned sequence is
represented by a builtin Haskell list.
Precondition:
G(g) = (V,E,VL,EL)
Postcondition:
(ForAll ov: ov IN Result <=> ov IN V) &&
length(Result) == size(V)
Accessor from_edges g v1
returns a sequence of all vertices v2
such that there is an edge from
vertex v1
to vertex v2
in graph g
. The returned sequence is
represented by a builtin Haskell list.
Precondition:
G(g) = (V,E,VL,EL) && v1 IN V
Postcondition:
(ForAll v2: v2 IN Result <=> (v1,v2) IN E) &&
length(Result) == (# v2 :: (v1,v2) IN E)
TODO: Function from_edges g v1
should return []
when v1
does not appear in g
, so that it can work well with the
Wizard’s Adventure game. We should redefine the precondition and
postcondition to specify this behavior.
Accessor all_vertices_labels g
returns a
sequence of all pairs (v,l)
such
that v
is a vertex and l
is it’s label in graph g
. The returned sequence is
represented by a builtin Haskell list.
Precondition:
G(g) = (V,E,VL,EL)
Postcondition:
(ForAll v, l: (v,l) IN Result <=> (v,l) IN VL) &&
length(Result) == size(VL)
Accessor from_edges_labels g v1
returns a
sequence of all pairs (v2,l)
such that there is an edge (v1,v2)
labelled with l
in graph g
.
Precondition:
G(g) = (V,E,VL,EL) && v1 IN V
Postcondition:
(ForAll v2, l :: (v2,l) IN Result <=> ((v1,v2),l) IN EL)
&& length(Result) == (# v2 :: (v1,v2 ) IN E)
TODO: Function from_edges_labels g v1
should return
[]
when v1
does not appear in g
, so that it can work well with the
Wizard’s Adventure game. We should redefine the precondition and
postcondition to specify this behavior.
Below we state the header for a Haskell module Digraph_XXX
that implements the Labelled Digraph ADT. The module name suffix XXX
denotes
the particular implementation for a data representation, but the
signatures and semantics of the operations are the same regardless of
representation.
The module exports data type Digraph
, but
its constructors are not exported. This allows modules that import Digraph_XXX
to
use the data type without knowing how the data type is implemented.
If we had Digraph(..)
in the export list, then the data type and all its constructors would be
exported.
The intention of this interface is to constrain the type parameters
of Digraph a b c
so that:
Type a
(i.e., type VertexType
)
must be in Haskell class Eq
. This is
essentially required by the interface invariant (i.e., the Labelled
Digraph Properties).
Types a
, b
, and c
(i.e., types VertexType
,
VertexLabelType
,
and EdgeLabelType
)
must be in Haskell class Show
. This
contraint enables the vertices and labels to be displayed as
text.
It may be desirable (or necessary) for an implementation to further
constrain the type parameters. For example, some implementations may
need to constrain VertexType
to
be from class Ord
(i.e.,
totally ordered). It does not seem to restrict the generality of the ADT
significantly to require that vertices be drawn from a totally ordered
set such as integers, strings, or enumerated types.
module DigraphADT_XXX
Digraph --constraints (Eq a, Show a, Show b, Show c)
( --Digraph a b c
, new_graph --Digraph a b c -> Bool
, is_empty --Digraph a b c -> a -> b -> Digraph a b c
, add_vertex --Digraph a b c -> a -> Digraph a b c
, remove_vertex--Digraph a b c -> a -> b -> Digraph a b c
, update_vertex--Digraph a b c -> a -> b
, get_vertex --Digraph a b c -> a -> Bool
, has_vertex --Digraph a b c -> a -> a -> c -> Digraph a b c
, add_edge --Digraph a b c -> a -> a -> Digraph a b c
, remove_edge --Digraph a b c -> a -> a -> c -> Digraph a b c
, update_edge --Digraph a b c -> a -> a -> c
, get_edge --Digraph a b c -> a -> a -> Bool
, has_edge --Digraph a b c -> [a]
, all_vertices --Digraph a b c -> a -> [a]
, from_edges --Digraph a b c -> [(a,b)]
, all_vertices_labels--Digraph a b c -> a -> [(a,c)]
, from_edges_labels
)where -- definitions for the types and functions
Note: The Glasgow Haskell Compiler (GHC) release 8.2 (July 2017) and the Cabal-Install package manager release 2.0 (August 2017) support a new mixin package system called Backpack. This extension would enable us to define an abstract module “DigraphADT” as a signature file with the above interface. Other modules can then implement this abstract interface thus giving a more explicit and flexible definition of this abstract data type.
This section gives an implementation of the ADT that uses Haskell lists to represent the vertex and edge sets.
We represent the List implementation of the Labelled Digraph ADT as
an instance of the Haskell algebraic data type Digraph
as
shown below. (Remember that type variable a
is VertexType
,
b
is VertexLabelType
,
and c
is EdgeLabelType
.)
data Digraph a b c = Graph [(a,b)] [(a,a,c)]
In an instance (Graph vs es)
:
vs
is a list of tuples (v,vl)
where
v
has VertexType
and
represents a vertex of the digraphvl
has VertexLabelType
and is the unique label associated with vertex v
v
occurs at most once in vs
(i.e., vs
encodes a function from vertices to vertex
labels)es
is a list of tuples ((v1,v2),el)
where
v1
and v2
are vertices occurring in
vs
, representing a directed edge from v1
to
v2
el
has EdgeLabelType
and is the unique label associated with edge (v1,v2)
(v1,v2)
occurs at most once in es
(i.e., es
encodes a function from edges to edge
labels)In terms of the abstract model, vs
encodes
VL
directly and, because VL
is a total
function on V
, it encodes V
indirectly.
Similarly, es
encodes EL
directly and
E
indirectly.
Of course, there are many other ways to represent the graph as lists. This representation is biased for a context where, once built, the labelled digraph is relatively static and the most frequent operations are the retrieval of labels attached to vertices or edges. That is, it is biased toward the Adventure game use case.
Given that all the type parameters must be of class Show
, we also
define Digraph
to
also be of class Show
as
defined below.
instance (Show a, Show b, Show c) =>
Show (Digraph a b c) where
show (Graph vs es) =
"(Digraph " ++ show vs ++ ", " ++ show es ++ ")"
Given the above description, we then define the following implementation (representation) invariant for the list-based version of the Labelled Digraph ADT:
Any Haskell
Digraph
value(Graph vs es)
with abstract modelG = (V,E,VL,EL)
, appearing in either the arguments or return value of an operation, must also satisfy the following:
(ForAll v, l :: (v,l) IN vs <=> (v,l) IN VL ) &&
(ForAll v1, v2, m :: (v1,v2,m) IN es <=> ((v1,v2),m) IN EL )
The code in this section shows a list-based implementation for several of the operations related to vertices.
The Haskell module for the list representation of the Labelled
Digraph ADT is in source file DigraphADT_List.hs
. A simple smoke
test driver module is in source file DigraphADT_TestList.hs
.
The implementations of constructor new_graph
and accessor
is_empty
are straightforward.
new_graph :: (Eq a, Show a, Show b, Show c) =>
Digraph a b c
= Graph [] []
new_graph
is_empty :: (Eq a, Show a, Show b, Show c) =>
Digraph a b c -> Bool
Graph [] _ ) = True
is_empty (= False is_empty _
Function has_vertex
just needs to
search through the list of vertices to determine whether or not the
vertex occurs. It relies upon VertexType
being in class
Eq
.
has_vertex :: (Eq a, Show a, Show b, Show c) =>
Digraph a b c -> a -> Bool
Graph vs _) ov =
has_vertex (not (null [ n | (n,_) <- vs, n == ov])
Because of lazy evaluation, the list comprehension only needs to evaluate far enough to find the occurrence of the vertex in the list.
To add a new vertex and its label to the graph, add_vertex
must return a new graph
with the new vertex-label pair added to the head of the vertex list. To
meet the specification, it must not allow a vertex to be added if the
vertex already occurs in the list.
add_vertex :: (Eq a, Show a, Show b, Show c) =>
Digraph a b c -> a -> b -> Digraph a b c
@(Graph vs es) nv nl
add_vertex g| not (has_vertex g nv) = Graph ((nv,nl):vs) es
| otherwise = error has_nv
where has_nv =
"Vertex " ++ show nv ++ " already in digraph"
Function remove_vertex
is a
bit trickier with this representation. To remove an existing vertex and
its label from the graph, remove_vertex
must return a new graph
with that vertex’s tuple removed from the list of vertices and with any
outgoing edges also removed from the list of edges.
remove_vertex :: (Eq a, Show a, Show b, Show c) =>
Digraph a b c -> a -> Digraph a b c
@(Graph vs es) ov
remove_vertex g| has_vertex g ov = Graph ws fs
| otherwise = error no_ov
where ws = [ (w,m) | (w,m) <- vs, w /= ov ]
= [ (v1,v2,m) |
fs <- es, v1 /= ov, v2 /= ov ]
(v1,v2,m) = "Vertex " ++ show ov ++ " not in digraph" no_ov
The implementation of remove_vertex
filters all occurrences
of the vertex from the list of vertices. Given the implementation
invariant, this is not necessary. However, this potentially adds some
safety to the implementation at the possible expense of execution
time.
For an existing vertex in the list of vertices, function update_vertex
replaces the old label
with the new label. Like remove_vertex
, it potentially
processes the entire list of vertices and makes the change to all
occurrences, when the implementation invariant would allow it to stop on
the first (and only) occurrence.
update_vertex :: (Eq a, Show a, Show b, Show c) =>
Digraph a b c -> a -> b -> Digraph a b c
@(Graph vs es) ov nl
update_vertex g| has_vertex g ov = Graph (map chg vs) es
| otherwise = error no_ov
where chg (w,m) = (if w == ov then (ov,nl) else (w,m))
= "Vertex " ++ show ov ++ " not in digraph" no_ov
For an existing vertex, function get_vertex
retrieves the label.
Because of lazy evaluation, the search of the list of vertices stops
with the first occurrence.
TODO: Modify appropriately if changed to Maybe
return.
get_vertex :: (Eq a, Show a, Show b, Show c) =>
Digraph a b c -> a -> b
Graph vs _) ov
get_vertex (| not (null ls) = head ls
| otherwise = error no_ov
where ls = [ l | (w,l) <- vs, w == ov]
= "Vertex " ++ show ov ++ " not in digraph" no_ov
TODO: Modify source file appropriately if changed to
Maybe
return.
The remainder of the functions are defined in file DigraphADT_List.hs
..
We can create an empty labelled digraph g0
having Int
identifiers for vertices, Int
labels for
vertices, and Int
labels for
edges as follows:
= (new_graph :: Digraph Int Int Int) g0
Then we can add a new vertex with identifier 1
and vertex
label 101
as
follows:
= add_vertex g0 1 101 g1
TODO: Consider whether to make any of the following changes to the specification and implementation above.
Based on the list-based design and implementation above, what improvements should we consider? Here are some possibilities.
As described above, the current list implementations of functions
such as remove_vertex
and update_vertex
do some unnecessary work
with respect to the implementation invariant. This could be
eliminated.
The data representation (i.e., implementation invariant) could be
changed to allow, for example, multiple occurrences of vertices in the
vertex list. This would avoid the checks of has_vertex
in add_vertex
and update_vertex
. Then, as it does above,
remove_vertex
needs to remove
all occurrences of the vertex.
Other functions would need to be modified accordingly so that they
only access the first occurrence of a vertex (especially the all_vertices
and all_vertices_labels
functions).
A similar change could be made to the list of edges.
Note: The Labelled Diagraph ADT specification does not specify what the behavior should be when the referenced vertex or edge is not defined. The change suggested in this item gives non-error behavior to those situations. Perhaps a better alternative would be to change the general ADT specification to require specific behaviors in those cases.
Most of the functions throw an error
exception when the vertex they reference does not exist. A better
Haskell design would redefine these functions to return a Maybe
or Either
value.
This would eliminate most of the has_vertex
checks and make the
functions defined on all possible inputs.
This would require changes to the overall Labelled Digraph ADT specification and its abstract interface.
New functions could be added to the Labelled Digraph ADT—such as an equality check on graphs, a constructor that creates a copy of an existing graph, or functions to apply various graph algorithms.
Existing functions could be eliminated. For example, if the graph is only constructed and used for retrieval, then the remove and update functions could be eliminated.
This section gives an implementation of the ADT that uses a Haskell
Map
to
map a vertex to the set of outgoing edges from that vertex
We represent the Map implementation of the Labelled Digraph ADT as an
instance of the Haskell algebraic data type Digraph
as
shown below. (Remember that type variable a
is VertexType
,
b
is VertexLabelType
,
and c
is EdgeLabelType
.)
import qualified Data.Map.Strict as M
data Digraph a b c = Graph (M.Map a (b,[(a,c)]))
In the data constructor (Graph m)
,
m
is an instance of Data.Map.Strict
.
This collection is set of key-value pairs implemented as a balanced
tree, giving logarithmic access time.
An instance of (Graph m)
corresponds to the abstract model as follows:
The keys for the Map m
collection are of VertexLabelType
.
The interface invariant requires that VertexType
be
in class Eq
. The
implementation based on Data.Map.Strict
further constrains vertices to be in subclass Ord
because
the vertices are the keys of the Map
.
TODO: Consider restricting the Digraph spec to require
Ord
.
Map m
is
defined for all keys v1
in vertex set V
and
undefined for all other keys.
For some vertex v1
, the value of m
at
key v1
is a pair (l,es)
where
l
is an element of VertexLabelType
and is the unique label associated with v1
, that is,
l = VL(v1)
.
es
is the list of all tuples (v2,el)
such that (v1,v2) IN E
, el IN
EdgeLabelType
,
and el = EL((v1,v2))
. That is, (v1,v2)
is an
edge and el
is its unique label.
Given that all the type parameters must be of class Show
, we also
define Digraph
to
also be of class Show
as
defined below.
instance (Show a, Show b, Show c) => Show (Digraph a b c) where
show (Graph m) = "(Digraph " ++ show (M.toAscList m) ++ ")"
Given the above description, we then define the following implementation (representation) invariant for the list-based version of the Labelled Digraph ADT:
Any Haskell
Digraph
value(Graph m)
with abstract modelG = (V,E,VL,EL)
, appearing in either the arguments or return value of an operation, must also satisfy the following:
(ForAll v1, l, es ::
( m(v1) defined && m(v1) == (l,es) ) <=>
( VL(v1) == l &&
(ForAll v2, el :: (v2,el) IN es <=>
EL((v1,v2)) == el) ) )
The code in this section shows a map-based implementation for the same operations we examined for the list-based implementation.
The Haskell module for the map representation of the Labelled Digraph
ADT is in source file DigraphADT_Map.hs
.. A simple smoke
test driver module is in source file DigraphADT_TestMap.hs
..
Constructor new_graph
and
accessors is_empty
and
has_vertex
are just wrappers for functions from Data.Map.Strict
.
new_graph :: (Ord a, Show a, Show b, Show c) =>
Digraph a b c
= Graph M.empty
new_graph
is_empty :: (Ord a, Show a, Show b, Show c) =>
Digraph a b c -> Bool
Graph m) = M.null m
is_empty (
has_vertex :: (Ord a, Show a, Show b, Show c) =>
Digraph a b c -> a -> Bool
Graph m) ov = M.member ov m has_vertex (
To add a new vertex and label to the graph, add_vertex
must return a graph with
the new key-value pair inserted into the existing graph’s
Map
. The value consists of the label paired with a nil list
of adjacent edges. To meet the specification, it must not allow a vertex
to be added if the vertex already occurs in the list.
add_vertex :: (Ord a, Show a, Show b, Show c) =>
Digraph a b c -> a -> b -> Digraph a b c
@(Graph m) nv nl
add_vertex g| not (has_vertex g nv) = Graph (M.insert nv (nl,[]) m)
| otherwise = error has_nv
where has_nv =
"Vertex " ++ show nv ++ " already in digraph"
Except for making sure the vertex to be deleted is the graph,
function remove_vertex
is just a
wrapper for the Data.Map.Strict.delete
function.
remove_vertex :: (Ord a, Show a, Show b, Show c) =>
Digraph a b c -> a -> Digraph a b c
@(Graph m) ov
remove_vertex g| has_vertex g ov = Graph (M.delete ov m)
| otherwise = error no_ov
where no_ov = "Vertex " ++ show ov ++ " not in digraph"
If the argument vertex is in the graph, then function update_vertex
retrieves its old label
and edge list and then reinserts the new label paired with the same edge
list.
update_vertex :: (Ord a, Show a, Show b, Show c) =>
Digraph a b c -> a -> b -> Digraph a b c
@(Graph m) ov nl
update_vertex g| has_vertex g ov =
Graph (M.insert ov (upd (M.lookup ov m)) m)
| otherwise = error no_ov
where upd (Just (ol,edges)) = (nl,edges)
= error no_entry
upd _ = "Vertex " ++ show ov ++ " not in digraph"
no_ov =
no_entry "Missing/malformed value for vertex " ++ show ov
For an existing vertex, function get_vertex
retrieves the associated
value and extracts the label.
get_vertex :: (Ord a, Show a, Show b, Show c) =>
Digraph a b c -> a -> b
@(Graph m) ov
get_vertex g| has_vertex g ov = getlabel (M.lookup ov m)
| otherwise = error no_ov
where getlabel (Just (ol,_)) = ol
= "Vertex " ++ show ov ++ " not in digraph" no_ov
The remainder of the functions are defined in file DigraphADT_Map.hs
.
The Map-based functions can be called in the same manner as the
List-based function, except that the vertices must be in class Ord
.
All the improvements suggested for the list-based implementation apply to the map-based implementation except for the first.
For large graphs, the map-based implementation should perform better than the list-based implementation.
For large graphs with many outgoing edges on each vertex, it might be
useful to implement the edge-list itself with a Map
.
This chapter (22) revisited the issues of specification, design, and implementation of data abstractions as modules in Haskell. It used a labelled digraph data structure as the example.
Although we may not specify all subsequent Haskell modules as systematically as we did in this chapter, we do use the modular style of programming in the various interpreters developed in Chapter 41 and following.
In the future, we plan to implement a Adventure game on top of the ADT implemented in this chapter.
TODO
TODO: If the Maybe
improvement is not done above for
errors, put that here as an exercise.
Restate the preconditions and postconditions for functions from_edges
and from_edges
so that they must return
empty lists when the argument vertex v1
is not in the vertex set. (See the
notes on these operations in the semantic specification above.)
Develop a comprehensive test script for the Labelled Digraph ADT implementations using blackbox, module-level, functional testing as described in Chapters 11 and 12.
Adapt the Haskell Labelled Digraph ADT interface and it two implementations to use GHC’s Backpack module system.
Specify a similar Labelled Digraph ADT as a Java interface.
Give two different implementations of the Labelled Digraph ADT in Java using the specification from the previous exercise.
Specify a similar Labelled Digraph ADT as a Python 3 module.
Give two different implementations of the Labelled Digraph ADT in Python using the specification from the previous exercise.
Choose one of the improvements described in the “Improvements in the list implementation” subsection and change the specification and list implementation as needed for the improvement.
Choose one of the improvements and change the specification and map implementation as needed for the improvement.
Give a full specification (similar to the one for the Labeled Digraph ADT) in this chapter) for the Carrie’s Candy Bowl project in Chapter 21<!-22—>. That is, give the name, set, signatures, and constructive semantics. If helpful, you may use the mathematical concept of bag.
Specify a doubly labelled directed multigraph data structure to replace the doubled labelled digraph. (That is, allow multiple directed edges from one vertex to another.)
Give an implementation of the doubly labelled directed multigraph specified in the previous exercise.
In this project, you are asked to design and implement Haskell modules to represent Mealy Machines and to simulate their execution.
This kind of machine is a useful abstraction for simple controllers that listen for input events and respond by generating output events. For example in an automobile application, the input might be an event such as “fuel level low” and the output might be command to “display low-fuel warning message”.
In the theory of computation, a Mealy Machine is a finite-state automaton whose output values are determined both by its current state and the current input. It is a deterministic finite state transducer such that, for each state and input, at most one transition is possible.
Appendix A of the Linz textbook [5] defines a Mealy Machine mathematically by a tuple
where
is a finite set of internal states
is the input alphabet (a finite set of values)
is the output alphabet (a finite set of values)
is the transition function
is the output function
is the initial state of (an element of )
In an alternative formulation, the transition and output functions can be combined into a single function:
We often find it useful to picture a finite state machine as a transition graph where the states are mapped to vertices and the transition function represented by directed edges between vertices labelled with the input and output symbols.
Specify, design, and implement a general representation for a Mealy Machine as a Haskell module implementing an abstract data type. It should hide the representation of the machine and should have, at least, the following public operations.
newMachine s
creates a
new machine with initial (and current) state s
and no transitions.
Note: This assumes that the state, input, and output sets are exactly those added with the mutator operations below. An alternative would be to change this function to take the allowed state, input, and output sets.
addState m s
adds a new
state s
to machine m
and returns an Either
wrapping the modified machine or an error message.
addTransition m s1 in out s2
adds a new transition to machine m
and returns an Either
wrapping the modified machine or an error message. From state s1
with input in
the
modified machine outputs out
and
transitions to state s2
.
addResets m
adds all
reset transitions to machine m
and returns the modified machine. From state s1
on input in
the
modified machine outputs out
and
transitions to state s2
. This
operation makes the transition function a total function by adding any
missing transitions from a state back to the initial state.
setCurrent m s
sets the
current state of machine m
to
s
and returns an Either
wrapping the modified machine or an error message.
getCurrent m
returns the
current state of machine m
.
getStates m
returns a
list of the elements of the state set of machine m
.
getInputs m
returns a
list of the input set of machine m
.
getOutputs m
returns a
list of the output set of machine m
.
getTransitions m
returns
a list of the transition set of machine m
. Tuple (s1,in,out,s2)
occurs in the returned list if and only if, from state s1
with input in
, the
machine outputs out
and moves to
state s2
.
getTransitionsFrom m s
returns an Either
wrapping a list of the set of transitions enabled from state s
of machine m
or an error message.
Given the above implementation for a Mealy Machine, design and implement a separate Haskell module that simulates the execution of a Mealy Machine. It should have, at least, the following new public operations.
move m in
moves machine m
from the current
state given input in
and returns
an Either
wrapping a tuple (m',out)
or
an error message. The tuple gives the modified machine m'
and the output out
.
simulate m ins
simulates
execution of machine m
from its
current state through a sequence of moves for the inputs in list ins
and returns an Either
wrapping a tuple (m',outs)
or an error message. The tuple gives the modified machine m'
after the sequence of moves and
the output list outs
.
Note: It is possible to use a Labelled Digraph ADT module in the implementation of the Mealy Machine.
Implement a Haskell module that uses a different representation for the Mealy Machine. Make sure the simulator module still works correctly.
In Spring 2017, I created a Labelled Digraph ADT document by adapting and revising comments from the Haskell implementations of the Labelled Digraph abstract data type. I had specified the ADT and developed the implementations as my solution for Assignment #1 in CSci 556 (Multiparadigm Programming) in Spring 2015. I also included some content from my notes on Data Abstraction [3].
(In addition to the list- and map-based Haskell implementations of the Labelled Digraph ADT, I developed a list-based implementation in Elixir in Spring 2015 and two Scala-based implementations in Spring 2016.)
In Spring 2017, I also created a Mealy Machine Simulator Exercise document by adapting and revising a project I had assigned in the Scala-based offering of CSci 555 (Functional Programming) in Spring 2016.
In 2018, I merged and revised these documents to become new Chapter 22, Data Abstraction Revisited, in the textbook Exploring Languages with Interpreters and Functional Programming.
I retired from the full-time faculty in May 2019. As one of my post-retirement projects, I am continuing work on this textbook. In January 2022, I began refining the existing content, integrating additional separately developed materials, reformatting the document (e.g., using CSS), constructing a bibliography (e.g., using citeproc), and improving the build workflow and use of Pandoc.
I maintain this chapter as text in Pandoc’s dialect of Markdown using embedded LaTeX markup for the mathematical formulas and then translate the document to HTML, PDF, and other forms as needed.
Data abstraction; abstract data type (ADT), instance; specification of ADTs using name, sets, signatures, and semantics; constructor, accessor, mutator, and destructor operations; axiomatic and constructive semantics; abstract model (contract, precondition, postcondition, interface and implementation invariant, abstract interface); use of Haskell module hiding features to implement the abstract data type’s interface; using mathematical concepts to model the data abstraction (graph, digraph, labelled graph, multigraph, set, sequence, bag, total and partial functions, relation); graph data structure; adventure game.
Mealy Machine, simulator, finite-state automaton (machine), deterministic finite state transducer, state, transition, transition graph.