Exploring Languages with Interpreters
and Functional Programming
Chapter 7

H. Conrad Cunningham

11 September 2018

Copyright (C) 2016, 2017, 2018, H. Conrad Cunningham
Professor of Computer and Information Science
University of Mississippi
211 Weir Hall
P.O. Box 1848
University, MS 38677
(662) 915-5358

Browser Advisory: The HTML version of this textbook requires a browser that supports the display of MathML. A good choice as of September 2018 is a recent version of Firefox from Mozilla.

7 Data Abstraction

7.1 Chapter Introduction

Chapter 2 introduced the concepts of procedural and data abstraction. Chapter 6 focuses on procedural abstraction and modular design and programming. This chapter focuses on data abstraction.

The goals of this chapter are to:

7.2 Using Data Abstraction

Data abstraction can help make a program robust with respect to change in the data. As in the previous chapter, let’s begin the study of this design technique with an example.

7.2.1 Rational number arithmetic

For this example, let’s implement a group of Haskell functions to perform rational number arithmetic, assuming that the Haskell library does not contain such a data type.

In mathematics we usually write rational numbers in the form 𝚑𝚒\frac{\texttt{x}}{\texttt{y}} where x and y are integers and y β‰ \neq 0.

For now, let us assume we have a special type Rat to represent rational numbers and a constructor function

    makeRat :: Int -> Int -> Rat

to create a Haskell rational number instance from a numerator x and a denominator y. That is, makeRat x y constructs a Haskell rational number with mathematical value 𝚑𝚒\frac{\texttt{x}}{\texttt{y}}, where πš’β‰ 0\texttt{y} \neq 0.

Let us also assume we have selector functions numer and denom with the signatures:

    numer, denom :: Rat -> Int

Functions numer and denom take a valid Haskell rational number and return its numerator and denominator, respectively.

Requirement:
For any Int values x and y where πš’β‰ 0\texttt{y} \neq 0, there exists a Haskell rational number r such that makeRat x y == r and rational number values πš—πšžπš–πšŽπš› πš›πšπšŽπš—πš˜πš– πš›\frac{\texttt{numer r}}{\texttt{denom r}} == 𝚑𝚒\frac{\texttt{x}}{\texttt{y}}.

Note: In this example, we use fraction notation like 𝚑𝚒\frac{\texttt{x}}{\texttt{y}} to denote the mathematical value of the rational number. In constrast, r above denotes a Haskell value representing a rational number.

We consider how to implement rational numbers in Haskell later, but for now let’s look at rational arithmetic implemented using the constructor and selector functions specified above.

Given our knowledge of rational arithmetic from mathematics, we can define the operations for unary negation, addition, subtraction, multiplication, division, and equality as follows. We assume that the operands x and y are values created by the constructor makeRat.

    negRat :: Rat -> Rat 
    negRat x = makeRat (- numer x) (denom x)

    addRat, subRat, mulRat, divRat :: Rat -> Rat -> Rat  -- (1)
    addRat x y = makeRat (numer x * denom y + numer y * denom x)
                         (denom x * denom y) 
    subRat x y = makeRat (numer x * denom y - numer y * denom x)
                         (denom x * denom y) 
    mulRat x y = makeRat (numer x * numer y) (denom x * denom y) 
    divRat x y  -- (2) (3)
        | eqRat y zeroRat = error "Attempt to divide by 0"
        | otherwise       = makeRat (numer x * denom y)
                                    (denom x * numer y)

    eqRat :: Rat -> Rat -> Bool 
    eqRat x y = (numer x) * (denom y) == (numer y) * (denom x)

The above code:

  1. combines the type signatures for all four arithmetic operations into a single declaration by listing the names separated by commas

  2. introduces the parameterless function zeroRat to abstract the constant rational number value 0

    Note: We could represent zero as makeRat 0 1 but choose to introduce a separate abstraction.

  3. calls the error function for an attempt to divide by zero

These arithmetic functions do not depend upon any specific representation for rational numbers. Instead, they use rational numbers as a data abstraction defined by the type Rat, constant zeroRat, constructor function makeRat, and selector functions numer and denom.

The goal of a data abstraction is to separate the logical properties of data from the details of how the data are represented.

7.2.2 Rational number data representation

Now, how can we represent rational numbers?

For this package, we define type synonym Rat to denote this type:

    type Rat = (Int, Int) 

For example, (1,7), (-1,-7), (3,21), and (168,1176) all represent the value 𝟷𝟽\frac{\texttt{1}}{\texttt{7}}.

As with any value that can be expressed in many different ways, it is useful to define a single canonical (or normal) form for representing values in the rational number type Rat.

It is convenient for us to choose a Haskell rational number representation (x,y) that satisfies all parts of the following Rational Representation Property:

By relatively prime, we mean that the two integers have no common divisors except 1.

This representation keeps the magnitudes of the numerator x and denominator y small, thus reducing problems with overflow arising during arithmetic operations.

This representation also gives a unique representation for zero. For convenience, we define the name zeroRat to represent this constant:

    zeroRat :: (Int,Int)
    zeroRat = (0,1)

We can now define constructor function makeRat x y that takes two Int values (for the numerator and the denominator) and returns the corresponding Haskell rational number in this canonical form.

    makeRat :: Int -> Int -> Rat
    makeRat x 0 = error ( "Cannot construct a rational number "
                          ++ show x ++ "/0" )         -- (1)
    makeRat 0 _ = zeroRat
    makeRat x y = (x' `div` d, y' `div` d)            -- (2)
        where x' = (signum' y) * x                    -- (3,4)
              y' = abs' y 
              d  = gcd' x' y'

In the definition of makeRat, we use features of Haskell we have not used in the previous examples. the above code:

  1. uses the infix ++ (read β€œappend”) operator to concatenate two strings

    We discuss ++ in the chapter on infix operations.

  2. puts backticks (`) around an alphanumeric function name to use that function as an infix operator

    The function div denotes integer division. Above the div operator denotes the integer division function used in an infix manner.

  3. uses a where clause to introduce x', y', and d as local definitions within the body of makeRat

    These local definition can be accessed from within makeRat but not from outside the function. In contrast, sqrtIter in the Square Root example is at the same level as sqrt', so it can be called by other functions (in the same Haskell module at least).

    The where feature allows us to introduce new definitions in a top-down mannerβ€”first using a symbol and then defining it.

  4. uses type inference for local variables x', y', and d instead of giving explicit type definitions

    These parameterless functions could be declared

    but it was not necessary because Haskell can infer the types from the types involved in their defining expressions.

    Type inference can be used more broadly in Haskell, but explicit type declarations should be used for any function called from outside.

We require that makeRat x y satisfy the precondition:

    y /= 0

The function generates an explicit error exception if it does not.

As a postcondition, we require makeRat x y to return a result (x',y') such that:

Note: Together the two postcondition requirements imply that πš‘β€™πš’β€™\frac{\texttt{x'}}{\texttt{y'}} == 𝚑𝚒\frac{\texttt{x}}{\texttt{y}}.

The function signum' (similar to the more general function signum in the Prelude) takes an integer and returns the integer -1, 0, or 1 when the number is negative, zero, or positive, respectively.

    signum' :: Int -> Int 
    signum' n | n == 0     =  0 
              | n > 0      =  1 
              | otherwise  = -1 

The function abs' (similar to the more general function abs in the Prelude) takes an integer and returns its absolute value.

    abs' :: Int -> Int
    abs' n | n >= 0    =  n
           | otherwise = -n

The function gcd' (similar to the more general function gcd in the Prelude) takes two integers and returns their greatest common divisor.

    gcd' :: Int -> Int -> Int 
    gcd' x y = gcd'' (abs' x) (abs' y) 
        where gcd'' x 0 = x 
              gcd'' x y = gcd'' y (x `rem` y) 

Prelude operation rem returns the remainder from dividing its first operand by its second.

Given a tuple (x,y) constructed by makeRat as defined above, we can define numer (x,y) and denom (x,y) as follows:

    numer, denom :: Rat -> Int
    numer (x,_) = x
    denom (_,y) = y

The preconditions of both numer (x,y) and denom (x,y) are that their arguments (x,y) satisfy the Rational Representation Property.

The postcondition of numer (x,y) = x is that the rational number values πš‘πš—πšžπš–πšŽπš› (𝚑,𝚒)\frac{\texttt{x}}{\texttt{numer (x,y)}} == 𝚑𝚒\frac{\texttt{x}}{\texttt{y}}.

Similarly, the postcondition of denom (x,y) = y is that the rational number values πšπšŽπš—πš˜πš– (𝚑,𝚒)y\frac{\texttt{denom (x,y)}}{y} == 𝚑𝚒\frac{\texttt{x}}{\texttt{y}}.

Finally, to allow rational numbers to be displayed in the normal fractional representation, we include function showRat in the package. We use function show, found in the Prelude, here to convert an integer to the usual string format and use the list operator ++ to concatenate the two strings into one.

    showRat :: Rat -> String
    showRat x = show (numer x) ++ "/" ++ show (denom x)

Unlike Rat, zeroRat, makeRat, numer, and denom, function showRat (as implemented) does not use knowledge of the data representation. We could optimize it slightly by allowing it to access the structure of the tuple directly.

7.2.3 Modularization

There are three groups of functions in this package:

  1. the six public rational arithmetic functions negRat, addRat, subRat, mulRat, divRat, and eqRat

  2. the public type Rat, constant zeroRat, public constructor function makeRat, public selector functions numer and denom, and string conversion function showRat

  3. the private utility functions called only by the second group, but just reimplementations of Prelude functions anyway

7.2.3.1 Module RationalCore

As we have seen, data type Rat; constant zeroRat; functions makeRat, numer, denom, and showRat; and the functions’ preconditions and postconditions form the interface to the data abstraction.

The data abstraction hides the information about the representation of the data. We can encapsulate this group of functions in a Haskell module as follows. This source code must also be in a file named RationalCore.hs.

    module RationalCore
        (Rat, makeRat, zeroRat, numer, denom, showRat)
    where
        -- Rat,makeRat,zeroRat,numer,denom,showRat definitions

In terms of the information-hiding approach, the secret of the RationalCore module is the rational number data representation used.

We can encapsulate the utility functions in a separate module, which would enable them to be used by several other modules.

However, given that the only use of the utility functions is within the data representation module, we choose not to separate them at this time. We leave them as local functions in the data abstraction module. Of course, we could also eliminate them and use the corresponding Prelude functions directly.

7.2.3.2 Module Rational

Similarly, functions negRat, addRat, subRat, mulRat, divRat, and eqRat use the core data abstraction and, in turn, extend the interface to include rational number arithmetic operations.

We can encapsulate these in another Haskell module that imports the module giving the data representation. This module must be in a file named Rational1.hs.

    module Rational1
      ( Rat, zeroRat, makeRat, numer, denom, showRat,
        negRat, addRat, subRat, mulRat, divRat, eqRat )
    where
        import RationalCore
        -- negRat,addRat,subRat,mulRat,divRat,eqRat definitions 

Other modules that use the rational number package can import module Rational1.

7.2.3.3 Modularization critique

The modularization described above (potentially):

7.2.4 Alternative data representation

In the rational number data representation above, constructor makeRat creates pairs in which the two integers are relatively prime and the sign is on the numerator. Selector functions numer and denom just return these stored values.

An alternative representation is to reverse this approach, as shown in the following module (in file RationalDeferGCD.hs.)

    module RationalDeferGCD
        (Rat, zeroRat, makeRat, numer, denom, showRat)
    where

    type Rat = (Int,Int)

    zeroRat :: (Int,Int)
    zeroRat = (0,1)

    makeRat :: Int -> Int -> Rat
    makeRat x 0 = error ( "Cannot construct a rational number "
                          ++ show x ++ "/0" )
    makeRat 0 y = zeroRat 
    makeRat x y = (x,y)

    numer :: Rat -> Int
    numer (x,y) = x' `div` d
        where x' = (signum' y) * x 
              y' = abs' y 
              d  = gcd' x' y'

    denom :: Rat -> Int
    denom (x,y) = y' `div` d
        where x' = (signum' y) * x 
              y' = abs' y 
              d  = gcd' x' y'

    showRat :: Rat -> String
    showRat x = show (numer x) ++ "/" ++ show (denom x)

This approach defers the calculation of the greatest common divisor until a selector is called.

In this alternative representation, a rational number (x,y) must satisfy all parts of the following Deferred Representation Property:

We require that makeRat x y satisfies the precondition:

    y /= 0

The function generates an explicit error condition if it does not.

As a postcondition, we require makeRat x y to return a result (x',y') such that:

The preconditions of both numer (x,y) and denom (x,y) are that (x,y) satisfies the Deferred Representation Property.

The postcondition of numer (x,y) = x' is that the rational number values πš‘β€™πš—πšžπš–πšŽπš› (𝚑,𝚒)\frac{\texttt{x'}}{\texttt{numer (x,y)}} == 𝚑𝚒\frac{\texttt{x}}{\texttt{y}}.

Similarly, the postcondition of denom (x,y) = y is that the rational number values πšπšŽπš—πš˜πš– (𝚑,𝚒)yβ€²\frac{\texttt{denom (x,y)}}{y'} == 𝚑𝚒\frac{\texttt{x}}{\texttt{y}}.

Question:

What are the advantages and disadvantages of the two data representations?

Like module RationalCore, the design secret for this module, RationalDeferGCD, is the rational number data representation.

Regardless of which approach is used, the definitions of the arithmetic and comparison functions do not change. Thus the Rational module can import data representation module RationalCore or RationalDeferGCD.

Figure 7-1 shows the dependencies among the modules we have examined in the rational arithmetic example.

Figure 7-1. Rational Package Module Dependencies
Figure 7-1. Rational Package Module Dependencies

We can consider the RationalCore and RationalDeferGCD modules as two concrete instances (Haskell module s) of a more abstract module we call RationalRep in the diagram.

The module Rational relies on the abstract module RationalRep for an implementation of rational numbers. In the Haskell code above, there are really two versions of the Haskell module Rational that differ only in whether they import RationalCore or RationalDeferGCD.

We could also replace alias Rat by a user-defined type to get another alternative definition of RationalRep, as long as the interface functions do not have to work with types other than Int.

7.2.5 Haskell information-hiding modules

In the Rational Arithmetic example, we defined two information-hiding modules:

  1. β€œRationalRep”, whose secret is how to represent the rational number data and whose interface consists of the data type Rat, constant zeroRat, operations (functions) makeRat, numer, denom, and showRat, and the constraints on these types and functions

  2. β€œRational”, whose secret is how to implement the rational number arithmetic and whose interface consists of operations (functions) negRat, addRat, subRat, mulRat, divRat, and eqRat, the other module’s interface, and the constraints on these types and functions

We developed two distinct Haskell modules, RationalCore and RationalDeferGCD, to implement the β€œRationalRep” information-hiding module.

We developed one distinct Haskell module, Rational, to implement the β€œRational” information-hiding module. This module can be paired (i.e.Β by changing the import statement) with either of the other two variants of β€œRationalRep” module. (Source file Rational1.hs imports module RationalCore; source file Rational2.hs imports RationalDeferGCD.)

Unfortunately, Haskell 2010 has a relatively weak module system that does not support multiple implementations as well as we might like. There is no way to declare that multiple Haskell modules have the same interface other than copying the common code into each module and documenting the interface carefully. We must also have multiple versions of Rational that differ only in which other module is imported.

Together the Glasgow Haskell Compiler (GHC) release 8.2 (July 2017) and the Cabal-Install package manager release 2.0 (August 2017) support a new extension, the Backpack mixin package system. This new system remedies the above shortcoming. In this new approach, we would define the abstract module β€œRationalRep” as a signature file and require that RationalCore and RationalDeferGCD conform to it.

Further discussion of this new module system is beyond the scope of this chapter.

7.2.6 Testing

Chapter 12 discusses testing of the Rational modules designed in this chapter. The test scripts for:

7.3 Invariants

As we see in the rational arithmetic example, a module that provides a data abstraction must ensure that the objects it creates and manipulates maintain their integrityβ€”always have a valid structure and state.

These properties are invariants for those modules. An invariant for the data abstraction can help us design and implement such objects.

Invariant:

A logical assertion that must always be true for every β€œobject” created by the public constructors and manipulated only by the public operations of the data abstraction.

Often, we separate an invariant into two parts.

Interface invariant:

An invariant stated in terms of the public features and abstract properties of the β€œobject”.

Implementation (representation) invariant:

A detailed invariant giving the required relationships among the internal features of the implementation of an β€œobject”

An interface invariant is a key aspect of the abstract interface of a module. It is useful to the users of the module, as well to the developers.

7.3.1 RationalRep modules

In the Rational Arithmetic example, the interface invariant for the β€œRationalRep” abstract module is the following.

RationalRep Interface Invariant:

For any valid Haskell rational number r, all the following hold:

We note that the precondition for makeRat x y is defined above without any dependence upon the concrete representation.

    y /= 0

We can restate the postcondition for makeRat x y = r generically to require both of the following to hold:

The preconditions of both numer r and denom r are that their argument r satisfies the RationalRep Interface Invariant.

The postcondition of numer r = x' is that the rational number value πš‘β€™πšπšŽπš—πš˜πš– πš›\frac{\texttt{x'}}{\texttt{denom r}} is equal to the rational number value of r.

Similarly, the postcondition of denom r = y' is that the rational number value πš—πšžπš–πšŽπš› πš›yβ€²\frac{\texttt{numer r}}{y'} is equal to the rational number value of r.

An implementation invariant guides the developers in the design and implementation of the internal details of a module. It relates the internal details to the interface invariant.

7.3.1.1 RationalCore

We can state an implementation invariant for the RationalCore module.

RationalCore Implementation Invariant:

For any valid Haskell rational number r, all the following hold:

The implementation invariant implies the interface invariant given the definitions of data type Rat and selector functions numer and denom. Constructor function makeRat does the work to establish the invariant initially.

7.3.1.2 RationalDeferGCD

We can state an implementation invariant for the RationalDeferGCD module.

RationalDeferGCD Implementation Invariant:

For any valid Haskell rational number r, all the following hold:

The implementation invariant implies the interface invariant given the definitions of Rat and of the selector functions numer and denom. Constructor function makeRat is simple, but the selector functions numer and denom do quite a bit of work to establish the interface invariant.

7.3.2 Rational modules

The Rational abstract module extends the RationalRep abstract module with new functionality.

7.4 What Next?

The previous chapter examined procedural abstraction and stepwise refinement for development of a square root package.

This chapter examined data abstraction for development of a rational number arithmetic package. The chapters explored concepts and methods for modular design and programming using Haskell, including preconditions, postconditions, and invariants.

The next chapter examines the substitution model for evaluation of Haskell programs and explores efficiency and termination in the context of that model.

A later chapter examines how to test the modules developed in this example.

7.5 Exercises

For each of the following exercises, develop and test a Haskell function or set of functions.

  1. Develop a Haskell module (or modules) for line segments on the two-dimensional coordinate plane using the rectangular coordinate system.

    We can represent a line segment with two pointsβ€”the starting point and the ending point. Develop the following Haskell functions:

    We normally represent the plane with a rectangular coordinate system. That is, we use two axesβ€”an x axis and a y axisβ€”intersecting at a right angle. We call the intersection point the origin and label it with 0 on both axes. We normally draw the x axis horizontally and label it with increasing numbers to the right and decreasing numbers to the left. We also draw the y axis vertically with increasing numbers upward and decreasing numbers downward. Any point in the plane is uniquely identified by its x-coordinate and y-coordinate.

    Define a data representation for points in the rectangular coordinate system and develop the following Haskell functions:

    Now, using the various constructors and selectors, also develop the Haskell functions for line segments:

    Note that newSeg, startPt, endPt, midPt, and showSeg can be implemented independently from how the points are represented.

  2. Develop a Haskell module (or modules) for line segments that represents points using the polar coordinate system instead of the rectangular coordinate system used in the previous exercise.

    A polar coordinate system represents a point in the plane by its radial coordinate r (i.e.Β the distance from the pole) and its angular coordinate t (i.e.Β the angle from the polar axis in the reference direction). We sometimes call r the magnitude and t the angle.

    By convention, we align the rectangular and polar coordinate systems by making the origin the pole, the positive portion of the x axis the polar axis, and let the first quadrant (where both x and y are positive) be the smallest positive angles in the reference direction. That is, with a traditional drawing of the coordinate systems, we measure and the radial coordinate r as the distance from the origin measure the angular coordinate t counterclockwise from the positive x axis.

    Using knowledge of trigonometry, we can convert among rectangular coordinates (x,y) and polar coordinates (r,t) using the equations:

    Define a data representation for points in the polar coordinate system and develop the following Haskell functions:

    Functions newSeg, startPt, endPt, midPt, and showSeg should work as in the previous exercise.

  3. Modify the solutions to the previous two line-segment module exercises to enable the line segment functions to be in one module that works properly if composed with either of the two data representation modules. (The solutions may have already done this.)

  4. Modify the solution to the previous line-segment exercise to use the Backpack module system.

  5. Modify the modules in the previous exercise to enable the line segment module to work with both data representations in the same program.

  6. Modify the solution to the Rational Arithmetic example to use the Backpack module system.

  7. State preconditions and postconditions for the functions in abstract module Rational.

7.6 Acknowledgements

In Summer and Fall 2016, I adapted and revised much of this work from my previous materials:

In 2017, I continued to develop this work as Sections 2.6-2.7 in Chapter 2, Basic Haskell Functional Programming, of my 2017 Haskell-based programming languages textbook.

In Spring and Summer 2018, I divided the previous Basic Haskell Functional Programming chapter into four chapters in the 2018 version of the textbook, now titled Exploring Languages with Interpreters and Functional Programming. Previous sections 2.1-2.3 became the basis for new Chapter 4, First Haskell Programs; previous Section 2.4 became Section 5.3 in the new Chapter 5, Types; and previous sections 2.5-2.7 were reorganized into new Chapter 6, Procedural Abstraction, and Chapter 7, Data Abstraction (this chapter).

I maintain this chapter as text in Pandoc’s dialect of Markdown using embedded LaTeX markup for the mathematical formulas and then translate the document to HTML, PDF, and other forms as needed.

7.7 References

[Abelson 1996]:
Harold Abelson and Gerald Jay Sussman. Structure and Interpretation of Computer Programs (SICP), Second Edition, MIT Press, 1996.
[Cunningham 2014]:
H. Conrad Cunningham. Notes on Functional Programming with Haskell, 1993-2014.
[Cunningham 2018a]:
H. Conrad Cunningham. Notes on Data Abstraction, 1996-2018.
[Cunningham 2018b]:
H. Conrad Cunningham. Notes on Modular Design, 1996-2018.

7.8 Terms and Concepts

Haskell module, module exports and imports, module dependencies, rational number arithmetic, data abstraction, properties of data, data representation, precondition, postcondition, invariant, interface invariant, implementation or representation invariant, canonical or normal forms, relatively prime, information hiding, module secret, encapsulation, interface, abstract interface, type inference.