7 April 2019
Browser Advisory: The HTML version of this textbook requires a browser that supports the display of MathML. A good choice as of April 2019 is a recent version of Firefox from Mozilla.
TODO: These Scala-based notes are not as consistent and fluent as they should be. Revise them further.
These notes examine how to specify, design, and implement abstract data types in Scala.
The goals of these notes are to:
review the key concepts of modular design and programming (e.g. modules, interfaces, information hiding, and contracts)
explore how to specify abstract data types using a constructive approach based on an abstract model of the data type
illustrate how to use Scala features (e.g traits and classes) to define appropriate interfaces and implement information-hiding modules
examine the design and implementation of a nontrivial abstract data type (i.e. a doubly labelled directed graph)
The concepts and terminology in this chapter are mostly general. They are applicable to most any language. Here we look specifically at Scala. (I have implemented basically the same data abstraction module in Haskell and Elixir.)
In the provocative 1986 essay “No Silver Bullet—Essence and Accidents in Software Engineering,” software engineering pioneer Fred Brooks asserts that “building software will always be hard” because software systems are inherently complex, must conform to all sorts of physical, human, and software interfaces, must change as the system requirements evolve, and are inherently invisible entities [Brooks 1986].
A decade later Brooks again observes, “The best way to attack the essence of building software is not to build it at all” [Brooks 1995]. That is, software engineers should reuse both software and, more importantly, software designs.
What was true in the 1980s is still true today. Although software development tools and practices have evolved, removing some of the “accidental” properties of software development, the essential difficulties remain. Complexity continues to increase. The interfaces to which software must conform continue to change quickly and increase in number. The requirements on software continue to evolve, driven by inexorable changes in the environment and increasing penetration of computerized processes into new aspects of society. Globalization generates new requirements, which arise from both new opportunities and new competition.
We often develop software systems in multiperson teams. In many cases, the teams are geographically distributed, perhaps even across national boundaries. Communication among team members adds complexity to software development.
How should we approach software development in this contemporary context?
As a starting point, let us again look to a software engineering pioneer: David Parnas. Parnas focuses on how to decompose a system into modules to achieve robustness with respect to change and potential reuse of software. He stresses the clarity of thought more than the sophistication of languages and tools.
Although Parnas and his colleagues published their ideas on modular specification in the 1970s and 1980s [Parnas 1972, 1976, 1979, 1985; Britton 1981], the ideas are as relevant today as they were when first published.
Parnas defines a module as “a work assignment given to a programmer or group of programmers” [Parnas 1978]. This is a software engineering view of a module.
In a programming language, a “module” may also be a program unit defined with a construct or convention. This is a programming language view of a module.
Ideally, a language’s module features should support the software engineering module methods.
According to Parnas, the goals of modular design are to [Parnas 1972]:
enable programmers to understand the system by focusing on one module at a time (i.e. comprehensibility).
shorten development time by minimizing required communication among groups (i.e. independent development).
make the software system flexible by limiting the number of modules affected by significant changes (i.e. changeability).
Parnas advocates the use of a principle he called information hiding to guide decomposition of a system into appropriate modules (i.e. work assignments). He points out that the connections among the modules should have as few information requirements as possible [Parnas 1972].
In the Parnas approach, an information-hiding module:
forms a cohesive unit of functionality separate from other modules
hides a design decision—its secret—from other modules
encapsulates an aspect of system likely to change (its secret)
Aspects likely to change independently of each other should become secrets of separate modules. Aspects unlikely to change can become interactions (connections) among modules.
This approach supports the goal of changeability (goal 2). When care is taken to design the modules as clean abstractions with well-defined and documented interfaces, the approach also supports the goals of independent development (goal 1) and comprehensibility (goal 3).
Information hiding has been absorbed into the dogma of contemporary object-oriented programming. However, information hiding is often oversimplified as merely hiding the data and their representations [Weiss 2001].
The secret of a well-designed module may be much more than that. It may include such knowledge as a specific functional requirement stated in the requirements document, the processing algorithm used, the nature of external devices accessed, or even the presence or absence of other modules or programs in the system [Parnas 1972, 1979, 1985]. These are important aspects that may change as the system evolves.
Let’s review the concepts of precondition and postcondition, which introduced in the notes on Recursion Styles, Correctness, and Efficiency (Scala Version) [Cunningham 2019b].
The precondition of a function is what the caller (i.e. the client of the function) must ensure holds when calling the function. A precondition may specify the valid combinations of values of the arguments. It may also record any constraints on any “global” state that the function accesses or modifies.
If the precondition holds, the supplier (i.e. developer) of the function must ensure that the function terminates with the postcondition satisfied. That is, the function returns the required values and/or alters the “global” state in the required manner.
We sometimes call the set of preconditions and postconditions for a function the contract for that function. These concepts underlie Meyer’s design by contract approach to software development [Meyer 1997].
It is important for information-hiding modules to have well-defined and stable interfaces.
According to Britton, Parker, and Parnas, an interface is a “set of assumptions … each programmer needs to make about the other program … to demonstrate the correctness of his own program” [Britton 1981].
A module’s interface includes the type signatures (i.e. names, arguments, and return values), preconditions, and postconditions of all public operations (e.g. functions).
As we see later, the interface also includes the invariant properties of the data values and structures manipulated by the module and the definitions of any new data types exported by the module. An invariant must be part of the precondition of public operations except operations that construct elements of the data type (i.e. constructors). It must also be part of the postcondition of public operations except operations that destroy elements of the data type (i.e. destructors).
In Scala, we often use a trait
, a class
, or an object
to implement the module.
An abstract interface is an interface that does not change when one module implementation is substituted for another [Britton 1981; Parnas 1978]. It concentrates on module’s essential aspects and obscures incidental aspects that vary among implementations.
Information-hiding modules and abstract interfaces enable us to design, build, and test software systems with multiple versions. The information-hiding approach seeks to identify aspects of a software design that might change from one version to another and to hide them within independent modules behind well-defined abstract interfaces.
We can reuse the software design across several similar systems. We can reuse an existing module implementation when appropriate. When we need a new implementation, we can create one by following the specification of the module’s abstract interface.
In Scala, we often use a trait
(or an abstract class
) to specify an abstract interface in concrete form so that we can use it for several “modules”.
The design and implementation of information-hiding modules must be approached from two points of view simultaneously [Meyer 1997]:
The client-supplier relationship is as represented in the following diagram:
TODO: Provide a better drawing below.
________________ ________________
| | | |
| Client |===USES===>| Supplier |
|________________| |________________|
(module user) (module)
The supplier’s concerns include:
efficient and reliable algorithms and data structures
convenient implementation
easy maintenance
The clients’ concerns include:
accomplishing their own tasks
using the supplier module without effort to understand its internal details
having a sufficient, but not overwhelming, set of operations.
As we have noted previously, the interface of a module is the set of features (i.e., public operations) provided by a supplier to clients.
A precise description of a supplier’s interface forms a contract between clients and supplier.
The client-supplier contract:
gives the responsibilities of the client
These are the conditions under which the supplier must deliver results—when the preconditions of the operations are satisfied (i.e. the operations are called correctly).
gives the responsibilities of the supplier
These are the benefits the supplier must deliver—make the postconditions hold at the end of the operation (i.e. the operations deliver the correct results).
The contract:
protects the client by specifying how much must be done by the supplier
protects the supplier by specifying how little is acceptable to the client
If we are both the clients and suppliers in a design situation, we should consciously attempt to separate the two different areas of concern, switching back and forth between our supplier and client “hats”.
What else should we consider in designing a good interface for an information-hiding module (e.g. a Scala trait
, class
, or object
)?
In designing an interface for a module, we should also consider the following criteria. Of course, some of these criteria conflict with one another; a designer must carefully balance the criteria to achieve a good interface design.
Note: These are general principles; they are not limited to Scala or functional programming. In object-oriented languages, these criteria apply to class interfaces, whether the instances are immutable or mutable.
Cohesion: All operations must logically fit together to support a single, coherent purpose. The module should describe a single abstraction.
Simplicity: Avoid needless features. The smaller the interface the easier it is to use the module.
No redundancy: Avoid offering the same service in more than one way; eliminate redundant features.
Atomicity: Do not combine several operations if they are needed individually. Keep independent features separate. All operations should be primitive, that is, not be decomposable into other operations also in the public interface.
Completeness: All primitive operations that make sense for the abstraction should be supported by the module.
Consistency: Provide a set of operations that are internally consistent in
naming convention (e.g., in use of prefixes like “set” or “get”, in capitalization, in use of verbs/nouns/adjectives),
use of arguments and return values (e.g., order and type of arguments),
behavior (i.e. make operations work similarly).
Avoid surprises and misunderstandings. Consistent interfaces make it easier to understand the rest of a system if part of it is already known.
The operations should be consistent with good practices for the specific language being used.
Reusability: Do not customize modules to specific clients, but make them general enough to be reusable in other contexts.
Robustness with respect to modifications: Design the interface of an module so that it remains stable even if the implementation of the module changes. (That is, it should be an abstract interface for an information-hiding module as we discussed above.)
Convenience: Where appropriate, provide additional operations (e.g., beyond the complete primitive set) for the convenience of users of the module. Add convenience operations only for frequently used combinations after careful study.
We must trade off conflicts among the criteria. For example, we must balance:
completeness versus simplicity
reusability versus simplicity
convenience versus consistency, simplicity, no redundancy, and atomicity
We must also balance these design criteria against efficiency and functionality.
The remainder of these notes use the term abstract data type to refer to a data abstraction. A data abstraction “module” defines and exports a user-defined type and a set of operations on that type. The type is abstract in the sense that its concrete representation is hidden; only the module’s operations may manipulate the representation directly.
For convenience, these notes sometimes use acronym ADT to refer to an abstract data type. The term abstract data type or acronym ADT should not be confused with algebraic data type, which we have discussed previously. We specify an algebraic data type with rules on how to compose and decompose them—primarily with syntax. We specify an abstract data type with rules about how the operations behave in relation to one another—primarily with semantics.
In these notes, we develop a family of doubly labelled digraph data structures.
As a graph, the data structure consists of a finite set of vertices (nodes) and a set of edges. Each edge connects two vertices.
Note: Some writers require that the set of vertices be nonempty, but here we prefer to allow an empty graph to have no vertices. (But the question remains whether such a graph with no vertices is pointless concept.)
As a directed graph (or digraph), each pair of vertices has at most one edge connecting them; the edge has a direction from one of the edges to the other.
As a doubly labelled graph, each vertex and each edge has some user-defined data (i.e. labels) attached.
These notes draw on the discussion of digraphs and their specification in Chapters 1 and 10 of the Dale and Walker book Abstract Data Types [Dale 1996].
For what purpose can we use a doubly labelled digraph data structure?
One concrete use case is to represent the game world in an implementation of an adventure game.
For example, in the Wizard’s Adventure Game from Chapter 5 of reference [Barski 2011], the game’s rooms become vertices, passages between rooms become edges, and descriptions associated with rooms or passages become labels on the associated vertex or edge (as shown in Figure 1).
Aside: By using a digraph to model the game world, we disallow multiple passages directly from one room to another. By changing the graph to a multigraph, we can allow multiple directed edges from one vertex to another.
The Adventure game must create and populate the game world initially, but it does not typically modify the game world during play. It maintains the game state (e.g. player location) separately from the game world. A player moves from room to room during play; the labelled digraph gives the static structure and descriptions of the game world.
How can we define an abstract data type?
The behavior of an ADT is defined by a set of operations that can be applied to an instance of the ADT (e.g. a Scala object).
Each operation of an ADT can have inputs (i.e. parameters) and outputs (i.e. results). The collection of information about the names of the operations and their inputs and outputs is the interface of the ADT.
To specify an ADT, we need to give:
the name of the ADT
the sets (or domains) upon which the ADT is built. These include the type being defined and the auxiliary types (e.g. primitive data types and other ADTs) used as parameters or return values of the operations.
the signatures (syntax or structure) of the operations
the semantics (or meaning) of the operations
We categorize an ADT’s operations into four groups depending upon their functionality:
A constructor (sometimes called a creator, factory, or producer function) constructs and initializes an instance of the ADT.
A mutator (sometimes called a modifier, command, or setter function) returns the instance with its state changed.
An accessor (sometimes called an observer, query, or getter function) returns information from the state of an instance without changing the state.
A destructor destroys an instance of the ADT.
We normally list the operations in that order.
If we use immutable data structures, then a mutator returns a distinct new instance of the ADT with a state that is a modified version of the original instance’s state. That is, we take an applicative (or functional or referentially transparent) approach to ADT specifications.
If we use mutable data structures, then a mutator can change the state of an instance in place. That may be more efficient, but it tends to be less safe. It also tends to make concurrent use of an abstract data type more problematic.
Technically speaking, a destructor is not an operation of the ADT. We can represent the other types of operations as functions on the sets in the specification. However, we cannot define a destructor in that way. But destructors are of pragmatic importance in the implementation of ADTs, particularly in languages that do not have automatic storage reclamation (i.e. garbage collection).
There are two primary approaches for specifying the semantics of the operations:
The axiomatic (or algebraic) approach gives a set of logical rules (properties or axioms) that relate the operations to one another. The meanings of the operations are defined implicitly in terms of each other.
The constructive (or abstract model) approach describes the meaning of the operations explicitly in terms of operations on other abstract data types. The underlying model may be any well-defined mathematical model or a previously defined ADT.
In some ways, the axiomatic approach is the more elegant of the two approaches. It is based in the well-established mathematical fields of abstract algebra and category theory. Furthermore, it defines the new ADT independently of other ADTs. To understand the definition of the new ADT it is only necessary to understand its axioms, not the semantics of a model.
However, in practice, the axiomatic approach to specification becomes very difficult to apply in complex situations. The constructive approach, which builds a new ADT from existing ADTs, is the more useful methodology for most practical software development situations.
In these notes, we use the constructive approach. We use contracts—preconditions, postconditions, and invariants—to specify the semantics of the operations.
A module that implements an ADT must ensure that the objects it creates and manipulates maintain their integrity—always have a valid structure and state.
These properties are invariant for the ADT operations. An invariant for the data abstraction can help us design and implement such objects.
A logical assertion that must always be true for every “object” created by the public constructors and manipulated only by the public operations of the data abstraction.
An invariant is a precondition of all operations except constructors and a postcondition of all operations except destructors.
Often, we separate an invariant into two parts.
An invariant stated in terms of the public features and abstract properties of the “object”.
A detailed invariant giving the required relationships among the internal features of the implementation of an “object”
An interface invariant is a key aspect of the abstract interface of an ADT module. It is useful to the users of the module, as well to the developers.
It is important to note that an invariant is not required to hold:
for private operations (e.g. functions, procedures, or methods) within an implementation of an ADT
at any point within the implementation of an operation except the beginning and the end
Now let’s look at a constructive specification of the doubly labelled digraph.
First, we specify the ADT as an implementation-independent abstraction. The secret of the ADT module is the data structure used internally to implement the doubly labelled digraph.
Then, we examine two implementations of the abstraction:
using Scala lists to represent the vertex and edge sets
using a Scala Map
to map a vertex to the set of outgoing edges from that vertex
Before we specify the ADT, let’s define the mathematical notation we use. We choose notation that can readily be used in comments in program.
We use the following notation and terminology to describe the abstract data type’s model and its semantics. (In this case study, we use notation that can be easily included as textual comments in source code rather than using special mathematical and logical symbols.)
TODO: Perhaps either expand this to be more general or contract it to just the concepts needed for these notes. Perhaps change these notes to use standard mathematical symbols.
(ForAll x, y :: p(x,y))
is true if and only if predicate p(x,y)
is true for all values of x
and y
.
(Exists x, y :: p(x,y))
is true if and only if there is at least one pair of values x
and y
for which p(x,y)
is true.
(# x, y :: p(x,y))
yields a count of pairs (x,y)
for which p(x,y)
is true.
<=>
denotes logical equivalence. p <=> q
is true if and only if the logical (Boolean) values p
and q
are equal (i.e. both true or both false).
x IN C
is true if and only if value x
is member of a collection C
(such as a set, bag, or sequence). Similarly, x NOT_IN C
denotes the negation of x IN C
.
A type consists of a set of values and a set of operations. We sometimes say a value is IN
a type to mean the value is IN
the set associated with the type.
For sets C
and D
, C UNION D
denotes set union, that is, a set that includes all the element of both C
and D
.
For sets C
and D
, C INTERSECT D
denotes set intersection, that is, a set that includes all elements that are both in C
and in D
.
For sets C
and D
, C - D
denotes set difference, that is, the set C
with all elements of set D
removed.
For sets C
and D
, C SUBSET_OF D
denotes that C
is subset of D
, that is, all the elements of C
also occur in D
.
A Cartesian product of two sets C
and D
is the set of all ordered pairs (x,y)
where x IN C
and y IN D
.
A tuple such as (x,*)
appearing in a collection such as { (x,*) }
denotes element x
grouped with all possible values of the second component. Note: We could also write { (x,*) }
using a quantification as:
{ (x,c) :: c IN some_domain }
A relation on sets C
and D
is a subset of the Cartesian product of C
and D
. That is, a set of tuples.
A function on sets C
and D
is a special case of a relation on C
and D
where each value from C
occurs in at most one tuple in the relation.
A total function is defined for all elements of its domain. A partial function is defined for a subset of the elements of its domain.
The abstract data type being defined is named Digraph
.
We specify that this abstract data type be represented by a Scala generic type
which has three type parameters (i.e. sets):
Type (or set) VertexType
denotes the possible vertices (i.e. vertex identifiers) in the Digraph
.
Type (or set) VertexLabelType
denotes the possible labels on vertices in the Digraph
.
Type (or set) EdgeLabelType
denotes the possible labels on edges in the Digraph
.
Given this ADT defines a digraph, edges can be identified by ordered pairs (tuples) of vertices.
Values of the above types, in particular the labels, may have several components.
Values the above types, in particular the labels, may have several components.
We define the following operations on the Labelled Digraph ADT. We specify these as methods on a Scala trait Digraph
. The methods have an implicit argument this
, which is an object from a concrete class that extends the trait (i.e. the receiver or target object for the method).
For conciseness, we use short generic parameter names A
, B
, and C
for VertexType
, VertexLabelType
, and EdgeLabelType
, respectively.
TODO: Probably should use better generic parameters than A, B, C.
Given the primary use case described above, we require a zero-parameter constructor that creates an empty graph. A concrete class that extends the trait may include other constructors.
Given the primary use case described above, we specify mutators to add a new vertex (addVertex
) and add a new edge between existing vertices (addEdge
{.scala).
We also specify mutators to remove a vertex (removeVertex
), remove an edge (removeEdge
), update the labels on a vertex (updateVertex
, and update the label on an edge (updateEdge
). (Note: In the identified use case, these are likely used less often than the mutators that add new vertices and edges. But we include them for completeness.)
def removeVertex(ov:A): Digraph[A,B,C]
def removeEdge(v1:A, v2:A): Digraph[A,B,C]
def updateVertex(ov:A, nl:B): Digraph[A,B,C]
def updateEdge(v1:A, v2:A, nl:C): Digraph[A,B,C]
Note: To remove a vertex requires us to remove any edges attached to that vertex.
We specify query functions to check whether the labelled digraph is empty (isEmpty
), has a given vertex (hasVertex
), and has an edge between two vertices (hasEdge
).
We specify accessors to retrieve the label associated with a given vertex (getVertex
) and with a given edge (getEdge
).
def getVertexLabel(ov:A): B
def getEdgeLabel(v1:A, v2:A): C
Given the identified use case, we also specify accessors to return a list of all vertices in the graph (allVertices
) and of the pairing of all vertices with their labels (allVerticesLabels
).
Similarly, we specify accessors to return a list of all outgoing edges from a given vertex (fromEdges
) and of the pairing of all outgoing edges with their edge labels (fromEdgesLabels
). Here we identify outgoing edge by the “to” vertex for that edge.
Question: What other operations might be useful?
We do not specify any separate destructors. We rely on the garbage collection of the objects. (In some cases, concrete classes that implement abstract data types may need to implement explicit destructors to deallocate resources such as open files, network connections, etc.)
We model the state of the instance of the Labelled Digraph ADT with an abstract value G
such that G = (V,E,VL,EL)
with G
’s components satisfying the following Labelled Digraph Properties.
V
is a finite subset of values from the set VertexType
. V
denotes the vertices (or nodes) of the digraph.
Any two elements of V
can be compared for equality.
Some implementations of the Digraph
may further require that the set V
be:
totally ordered—supporting <
and the other relational operators
hashable
E
is a binary relation on the set V
. A pair (v1,v2) IN E
denotes that there is a directed edge from v1
to v2
in the digraph.
Note that this model allows at most one (directed) edge from a vertex v1
to vertex v2
. It allows a directed edge from a vertex to itself.
Also, because vertices can be compared for equality, any two edges can also be compared for equality.
VL
is a total function from set V
to the set VertexLabelType
.
EL
is a total function from set E
to the set EdgeLabelType
.
We define the following interface invariant for the Labelled Digraph ADT:
Any valid labelled digraph instance
G
, appearing in either the arguments or return value of a public ADT operation, must satisfy the Labelled Digraph Properties.
We specify the various ADT operations below using their type signatures, preconditions, and postconditions. Along with the interface invariant, these comprise the (implementation-independent) specification of the ADT (i.e. its abstract interface).
In these assertions, for a digraph object this
that satisfies the invariants, G(this)
denotes its abstract model (V,E,VL,EL)
as described above. The value Result
denotes the return value of method.
Given this family of implementations uses immutable graph objects, every operation, both accessor and mutator, has a postcondition conjunct
G(this) = G(this')
where this
denotes the receiver object before the operation and this'
denotes the receiver object after the operation.
For an implementation allowing mutable graph objects, this requirement may be relaxed for some mutators. However, it should still hold for all accessors.
A parameterless constructor creates and returns a new empty instance of the graph ADT.
Precondition:
true
Postcondition:
G(Result) == ({},{},{},{})
For a Scala class constructor, the Result
is a new instance of the data type created by the constructor. Thus the postcondition is equivalent to G(this') == ({},{},{},{})
.
Mutator method
inserts vertex nv
with label nl
into graph this
and returns the resulting graph.
Precondition:
G(this) = (V,E,VL,EL) && nv NOT_IN V
Postcondition:
G(Result) == (V UNION {nv}, E, VL UNION {(nv,nl)}, EL)
Note: The set operation VL UNION {(nv,nl)}
redefines the function (i.e. a type of relation) VL
so that vertex nv
now maps to vertex label nl
(where there was no mapping beforehand).
If the precondition is true, then the method must terminate with the postcondition being true. The specification does not say what must occur if the precondition is false.
Consider the following questions:
Can we remove the nv NOT_IN V
conjunct in the precondition? (That is, can we weaken the precondition in this way?)
If so, what are ways we can modify the postcondition to handle the new input states? (That is, how do we strengthen the postcondition by adding new conjuncts for the new input states?)
Would it be appropriate to redefine the operation’s signature and semantics to return an object of type Option[Diagraph[A,B,C]]
? of some Either
type? What are advantages and disadvantages of this kind of change?
Mutator method
deletes vertex ov
from graph this
and returns the resulting graph.
Precondition:
G(this) = (V,E,VL,EL) && ov IN V
Postcondition:
G(Result) == (V', E', VL', EL')
where V' = V - {ov}
E' = E - {(ov,*),(*,ov)}
VL' = VL - {(ov,*)}
EL' = EL - {((ov,*),*),((*,ov),*)}
This operation also removes all edges attached to the vertex removed.
Question: Can we remove the ov IN V
conjunct in the precondition? If so, what are ways we can modify the postcondition to handle the new input states?
Mutator method
changes the label on vertex ov
in graph this
to be nl
and returns the resulting graph.
Precondition:
G(this) = (V,E,VL,EL) && ov IN V
Postcondition:
G(Result) == (V - {ov}, E, VL', EL)
where VL' = (VL - {(ov,VL(ov))}) UNION {(ov,nl)}
Question: Can we remove the ov IN V
conjunct in the precondition? If so, what are ways we can modify the postcondition to handle the new input states?
Mutator method
inserts an edge from vertex v1
to vertex v2
with label nl
in graph this
and returns the resulting graph.
Precondition:
G(this) = (V,E,VL,EL) && v1 IN V && v2 IN V &&
(v1,v2) NOT_IN E
Postcondition:
G(Result) == (V, E', VL, EL')
where E' = E UNION {(v1,v2)}
EL' = EL UNION {((v1,v2),nl)}
Question: Can we weaken the precondition?
Mutator method
deletes the edge from vertex v1
to vertex v2
from graph this
and returns the resulting graph.
Precondition:
G(this) = (V,E,VL,EL) && (v1,v2) IN E
Postcondition:
G(Result) == (V, E - {(v1,v2)}, VL, EL - { ((v1,v2),*) }
Question: Can we weaken the precondition?
Mutator method
changes the label on the edge from vertex v1
to vertex v2
in graph this
to have label nl
and then returns the resulting graph.
Precondition:
G(this) = (V,E,VL,EL) && (v1,v2) IN E
Postcondition:
G(Result) == (V, E, VL, EL')
where EL' == (EL - {((v1,v2),*)}) UNION {((v2,v2),nl)
Question: Can we weaken the precondition?
Accessor method
returns true
if and only if graph this
is empty.
Precondition:
G(this) = (V,E,VL,EL)
Postcondition:
Result == (V == {} && E == {})
Accessor method
returns true
if and only if ov
is a vertex of graph this
.
Precondition:
G(this) = (V,E,VL,EL)
Postcondition:
G(Result) == ov IN V
Accessor method
def hasEdge(v1:A, v2:A): Boolean
returns true
if and only if there is an edge from a vertex v1
to a vertex v2
in graph this
.
Precondition:
G(this) = (V,E,VL,EL)
Postcondition:
Result == (v1,v2) IN E
Accessor method
returns the label from vertex ov
in graph this
Precondition:
G(this) = (V,E,VL,EL) && ov IN V
Postcondition:
Result == VL(ov)
Question: Can we weaken the precondition?
Accessor method
returns the label on the edge from vertex v1
to vertex v2
in graph this
.
Precondition:
G(g) = (V,E,VL,EL) && (v1,v2) IN E
Postcondition:
Result == EL((v1,v2))
Question: Can we weaken the precondition?
Accessor method
returns a List of all the vertices in graph this
.
Precondition:
G(this) = (V,E,VL,EL)
Postcondition:
(ForAll ov:: ov IN Result <=> ov IN V) &&
length(Result) == size(V)
This returns the set V
as a Scala List
without duplicates. The order of the elements in the list is not specified. (Remember that the set VertexType
is not necessarily partially or totally ordered, but it does support equality comparisons)
Accessor method
returns a sequence of all vertices v2
such that there is an edge from vertex v1
to vertex v2
in graph this
.
Precondition:
G(this) = (V,E,VL,EL) && v1 IN V
Postcondition:
(ForAll v2:: v2 IN Result <=> (v1,v2) IN E) &&
length(Result) == (# v2 :: (v1,v2) IN E)
Question: Can we remove the precondition conjunct v1 IN V
?
Method call fromEdges(v1)
probably should return List()
when v1
does not appear in this
. This will make it can work well with the Wizard’s Adventure game. To do so, we can redefine the precondition and postcondition to specify appropriate behavior.
Revised Precondition (weaker):
G(this) = (V,E,VL,EL)
Revised Postcondition (stronger):
(v1 NOT_IN V => Result == List())
&&
(v1 IN V =>
(ForAll v2:: v2 IN Result <=> (v1,v2) IN E) &&
length(Result) == (# v2 :: (v1,v2) IN E) )
Accessor method
returns a sequence of all pairs (v,l)
such that v
is a vertex and l
is it’s label in graph this
.
Precondition:
G(this) = (V,E,VL,EL)
Postcondition:
(ForAll v, l:: (v,l) IN Result <=> (v,l) IN VL) &&
length(Result) == size(VL)
Accessor method
returns a List of all pairs (v2,l)
such that there is an edge (v1,v2)
labelled with l
in graph this
.
Precondition:
G(this) = (V,E,VL,EL) && v1 IN V
Postcondition:
(ForAll v2, l :: (v2,l) IN Result <=> ((v1,v2),l) IN EL)
&& length(Result) == (# v2 :: (v1,v2 ) IN E)
Question: Can we remove the precondition conjunct v1 IN V
?
Method call fromEdgesLabels(v1)
probably should return List()
when v1
does not appear in this
. This will make it can work well with the Wizard’s Adventure game. To do so, we can redefine the precondition and postcondition to specify appropriate behavior.
Revised Precondition (weaker):
G(this) = (V,E,VL,EL)
Revised Postcondition (stronger):
(v1 NOT_IN V => Result == List())
&&
(v1 IN V =>
(ForAll v2, l :: (v2,l) IN Result <=> ((v1,v2),l) IN EL)
&& length(Result) == (# v2 :: (v1,v2 ) IN E) )
We specify the abstract interface for the family of Scala implementations of the Digraph
ADT using a generic Scala trait Digraph[A,B,C]
, defined in file Digraph.scala
.
The trait defines the interface to the mutator and accessor methods. The construtor and destructor methods are left for the implementing class to implement. We document the semantics as comments in the Scala source code.
This section gives an implementation of the ADT that uses Scala lists to represent the vertex and edge sets.
Conceptually, the approach taken here is a variation on the adjacency matrix representation of a graph. However, unlike the typical presentation in an algorithms and data structures course, the approach here does not restrict the set of vertices to be from a finite range of integers.
We represent the List implementation of the Labelled Digraph ADT as a a Scala class DigraphList[A,B,C]
that extends Digraph[A,B,C]
. (Remember that type variable A
is VertexType
, B
is VertexLabelType
, and C
is EdgeLabelType
.)
We use the following primary constructor for this Scala class.
class DigraphList[A,B,C] private // private constructor
(private val vs: List[(A,B)],
private val es: List[(A,A,C)])
extends Digraph[A,B,C] { ... }
Above we use Scala features that we have not used before.
We declare a Scala constructor parameter (i,e. object field) as var
, val
, or neither.
var
means that the Scala compiler automatically generates a mutable instance variable with both public getter and setter methods.
val
means that the Scala compiler automatically generates an immutable instance variable with a public getter method.
Neither means that any variable that the Scala compiler generates will be immutable and will not have automatically generated public getter or setter methods. If the parameter is not used outside the initialization of the class, then no instance variable is generated.
We can add the modifier private
to Scala constructor parameters: private var
or private val
. This means the compiler will not generate public getter and setter methods.
We can declare the primary constructor as private
. That means it can only be called from inside the class. (In such a case, we may need to have an auxiliary constructor that is public.)
Now let’s consider how we use these fields to represent the abstract data type.
In an instance DigraphList(vs,es)
:
vs
is a list of tuples (v,vl)
where
v
has type VertexType
(i.e. A
) and represents a vertex of the digraphvl
has type VertexLabelType
(i.e. B
) and is the label associated with vertex v
v
occurs at most once in vs
(i.e. vs
encodes a function from vertices to vertex labels)Note: A list list vs
is sometimes called an association list because it associates a key (e.g. v
{.scala]) with its value (e.g. vl
).
es
is a list of tuples (v1,v2,el)
where
v1
and v2
are vertices occurring in vs
, representing a directed edge from v1
to v2
el
has type EdgeLabelType
(i.e. C
) and is the label associated with edge (v1,v2)
(v1,v2)
occurs at most once in es
(i.e. es
encodes a function from edges to edge labels)In terms of the abstract model, vs
encodes VL
directly and, because VL
is a total function on V
, it encodes V
indirectly. Similarly, es
encodes EL
directly and E
indirectly.
Of course, there are many other ways to represent the graph as lists. This representation is biased for a context where, once built, the labelled digraph is relatively static and the most frequent operations are the retrieval of labels attached to vertices or edges. That is, it is biased toward the Adventure game use case
Given the above description, we then define the following implementation (representation) invariant for the list-based version of the Labelled
Any Scala Digraph value
DigraphList(vs,es)
with abstract modelG = (V,E,VL,EL)
, appearing in either the implicit or explicit parameters or return value of an operation, must also satisfy the following:
(ForAll v, l :: (v,l) IN vs <=> (v,l) IN VL ) &&
(ForAll v1, v2, m :: (v1,v2,m) IN es <=> ((v1,v2),m) IN EL )
See the Digraph List implementation at DigraphList.scala
and some testing code at Test_DigraphList.scala
.
In addition to the private primary constructor discussed above, DigraphList
implements a public zero-parameter auxiliary constructor that returns a new empty instance of the Digraph ADT. This is the public constructor required by the Digraph
specification.
This constructor has the following contract, as required by the abstract specification:
true
G(this') == ({},{},{},{})
Auxiliary constructors in Scala must call another constructor, eventually calling the primary constructor.
As a convenience, DigraphList
also implements a public copy constructor that constructs this instance to have same state as an existing instance.
This constructor has the following contract:
G(dg) = (V,E,VL,EL)
G(this') == G(dg)
Based on the list-based design and implementation above, what improvements should we consider?
Here are some possibilities.
The current list implementations of methods such as removeVertex
and updateVertex
do some unnecessary work with respect to the implementation invariant. This could be eliminated.
The data representation (i.e. implementation invariant) could be changed to allow, for example, multiple occurrences of vertices in the vertex list. This would avoid the checks of hasVertex
in addVertex
and updateVertex
. Then, as it does above, removeVertex
needs to remove all occurrences of the vertex.
Other functions would need to be modified accordingly so that they only access the first occurrence of a vertex (especially the allVertices
and allVerticesLabels
methods).
A similar change could be made to the list of edges.
Most of the methods throw a sys.error
exception when the vertex they reference does not exist.
A better Scala functional design would be to redefine these functions to return an Option
or Either
value. This would eliminate most of the hasVertex
checks and make the functions defined on all possible inputs.
Alternatively, we could redefine the methods to give other meaningful behavior in those circumstances, as suggested by some of the questions in the ADT specification.
Either approach would require changes to the overall Labelled Digraph ADT specification and its abstract interface.
New methods could be added to the Labelled Digraph ADT—such as an equality check on graphs or functions to apply various graph algorithms.
Alternatively, these methods could be a separate abstraction layered on top of the existing ADT specification.
Existing methods could be eliminated. For example, if the graph is only constructed and used for retrieval, then the remove and update functions could be eliminated.
This section gives an implementation of the ADT that uses an instance of scala.collection.immutable.HashMap
to map a vertex to the set of outgoing edges from that vertex,
The approach taken here is also a variation on the adjacency list representation of a graph.
We represent the Map implementation of the Labelled Digraph ADT as a a Scala class DigraphMap[A,B,C]
that extends Digraph[A,B,C]
. (Remember that type variable A
is VertexType
, B
is VertexLabelType
, and C
is EdgeLabelType
.)
We use the following primary constructor for this Scala class.
import scala.collection.immutable.HashMap
class DigraphMap[A,B,C] private
(private val m: HashMap[A,(B,List[(A,C)])])
extends Digraph[A,B,C] { ... }
This implementation represents a labeled digraph as an instance of the Scala class DigraphMap(m)
, where m
is a Scala Map
implemented as a hash array mapped trie (i.e. HashMap
).
Note: The HashMap
collection requires that its keys be hashable.
An instance of DigraphMap(m)
corresponds to the abstract model as follows:
The keys for Map m
are from VertexType
(i.e. A
).
Map m
is defined for all keys v1
in vertex set V
and undefined for all other possible keys.
For some vertex v1
, the value of m
at key v1
is a pair (l,es)
where
l
is an element of VertexLabelType
(i.e. B
) and is the label associated with v1
, that is, l == VL(v1)
.
es
is the list of all tuples (v2,el)
such that (v1,v2) IN E
, el IN EdgeLabelType
(i.e. B
{.scala]), and el == EL((v1,v2))
. That is, (v1,v2)
is an edge and el
is its label.
Given the above description, we then define the following implementation (representation) invariant for the map-based version of the Labelled Digraph ADT:
Any Scala Digraph value
DigraphMap(m)
with abstract modelG = (V,E,VL,EL)
, appearing in either the implicit or explicit parameters or return value of an operation, must also satisfy the following:
(ForAll v1, l, es ::
( m(v1) defined && m(v1) == (l,es) ) <=>
( VL(v1) == l &&
(ForAll v2, el :: (v2,el) IN es <=>
EL((v1,v2)) == el) ) )
See the Digraph Map implementation at DigraphMap.scala
and some testing code at Test_DigraphMap.scala
.
As with the Digraph List implementation, the Map implementation has two auxiliary constructors—a zero-parameter constructor and a copy constructor.
All the improvements suggested for the list-based implementation apply to the map-based implementation except for the first.
For large graphs, the map-based implementation should perform better than the list-based implementation.
For large graphs with many outgoing edges on each vertex, it might be useful to implement the edge-list itself with a HashMap
.
Alternatively, a HashMap
could use use pairs (v1,v2)
for its keys. This would give a more direct analog to an array-based adjacency matrix representation. The vertex labels could be represented separately by a list or map that associates a vertex with its label.
In this project, you are asked to design and implement Scala “modules” to represent Mealy Machines and to simulate their execution.
A Mealy Machine is a useful abstraction for simple controllers that listen for input events and respond by generating output events. For example in an automobile application, the input might be an event such as “fuel level low” and the output might be command to “display low-fuel warning message”.
In the theory of computation, a Mealy Machine is a finite-state automaton whose output values are determined both by its current state and the current input. It is a deterministic finite state transducer such that, for each state and input, at most one transition is possible.
Appendix A of the Linz textbook [Linz 2017] defines a Mealy Machine mathematically by a tuple
where
is a finite set of internal states
is the input alphabet (a finite set of values)
is the output alphabet (a finite set of values)
is the transition function
is the output function
is the initial state of (an element of )
In an alternative formulation, the transition and output functions can be combined into a single function:
We often find it useful to picture a finite state machine as a transition graph where the states are mapped to vertices and the transition function represented by directed edges between vertices labelled with the input and output symbols.
Specify, design, and implement a general representation for a Mealy Machine as a set of Scala definitions implementing an abstract data type. It should hide the representation of the machine behind an abstract interface and should have, at least, the following public operations.
Constructor MealyMachine(s)
creates a new machine with initial (and current) state s
and no transitions.
Mutator method addState(s)
adds a new state s
to this machine and returns an Either
wrapping the modified machine or an error message.
Mutator method addTransition(s1,in,out,s2)
adds a new transition to this machine and returns an Either
wrapping the modified machine or an error message. From state s1
with input in
, the modified machine outputs out
and transitions to state s2
.
Mutator method addResets
adds all reset transitions to this machine and returns the modified machine. This operation makes the transition function a total function by adding any missing transitions from a state back to the initial state.
Mutator method setCurrent(s)
sets the current state of this machine to s
and returns an Either
wrapping the modified machine or an error message.
Accessor method getCurrent
returns the current state of this machine.
Accessor method getStates
returns a list of the elements of the state set of this machine.
Accessor method getInputs
returns a list of the input set of this machine.
Accessor method getOutputs
returns a list of the output set of this machine.
Accessor method getTransitions
returns a list of the transition set of this machine. Tuple (s1,in,out,s2)
occurs in the returned list if and only if, from state s1
with input in
, the machine outputs out
and moves to state s2
.
Accessor method getTransitionsFrom(s)
returns an Either
wrapping a list of the set of transitions enabled from state s
of this machine or an error message.
Note: It is possible to use a Labelled Digraph ADT module in the implementation of the Mealy Machine. A state is a vertex of the graph, transition is an edge of the graph, and an (in,out)
is a label for an edge.
Given the above implementation for a Mealy Machine ADT, design and implement a separate Scala module that simulates the execution of a Mealy Machine. It should have, at least, the following public operations.
Mutator move(m,in)
moves machine m
from the current state given input in
and returns an Either
wrapping a tuple (mm,out)
or an error message. The tuple gives the modified machine mm
and the output out
.
Mutator method simulate(m,ins)
simulates execution of machine m
from its current state through a sequence of moves for the inputs in list ins
and returns an Either
wrapping a tuple (mm,outs)
or an error message. The tuple gives the modified machine mm
after the sequence of moves and the output list outs
.
Implement a Scala abstract data type that uses a different representation for the Mealy Machine. Make sure the simulator module still works correctly.
I specified the Doubly Labelled Digraph ADT for Assignment #1 in CSci 556 (Multiparadigm Programming) in Spring 2015. This assignment was motivated by underlying data structure for the Wizard’s Adventure Game from [Barski 2011]. I developed the list- and map-based Haskell implementations as my solution for that assignment. In addition, I developed a list-based implementation in Elixir and used it to implement the Adventure game. In Spring 2016, I developed list- and map-based Scala implementations for CSci 555 (Functional Programming).
In Spring 2016, I defined the Mealy Machine Simulator as a programming assignment in the Scala-based CSci 555 (Functional Programming) course.
In Spring 2017, I created a Labelled Digraph ADT document by adapting and revising comments from the Haskell implementations of the Labelled Digraph abstract data type. I also included some content from my notes on Data Abstraction [Cunningham 2019]. I also revised the Mealy Machine assignment description to create a Mealy Machine Exercise document.
In 2018, I merged and revised these documents to become a new Chapter 23, Data Abstraction Revisited, in the textbook Exploring Languages with Interpreters and Functional Programming (ELIFP).
In 2019, I adapted Chapter 23 and parts of Chapter 7 of ELIFP and the Scala-based source code comments to develop these notes.
I maintain this chapter as text in Pandoc’s dialect of Markdown using embedded LaTeX markup for the mathematical formulas and then translate the document to HTML, PDF, and other forms as needed.
TODO: Update
Data abstraction; module, goals of modularization (comprehensibility, independent development, changeability), information hiding, secret, encapsulation, contract (precondition, postcondition, invariant), interface, abstract interface; client-supplier relationship, design criteria for interfaces; abstract data type (ADT), instance; specification of ADTs using name, sets, signatures, and semantics; constructor, accessor, mutator, and destructor operations; axiomatic and constructive semantics; abstract model, interface and implementation invariant; using mathematical concepts to model the data abstraction; graph, digraph, labelled graph, multigraph, set, sequence, total and partial functions, relation; adventure game; use of Scala trait and generics to define ADT interface, Scala constructors (private, auxiliary), builtin List and Map (HashMap) data structures; adjacency matrix, adjacency list.
Mealy Machine, simulator, finite-state automaton (machine), deterministic finite state transducer, state, transition, transition graph.