Exploring Languages
with Interpreters
and Functional Programming
Chapter 5
H. Conrad Cunningham
045April 2022
Browser Advisory: The HTML version of this textbook requires a browser that supports the display of MathML. A good choice as of April 2022 is a recent version of Firefox from Mozilla.
The goals of this chapter are to:
examine the general concepts of type systems
explore Haskell’s builtin types
The term type tends to be used in many different ways in programming languages. What is a type?
The chapter on object-based paradigms discusses the concept of type in the context of object-oriented languages. This chapter first examines the concept more generally and then examines Haskell’s builtin types.
Conceptually, a type is a set of values (i.e., possible states or objects) and a set of operations defined on the values in that set.
Similarly, a type S
is (a behavioral) subtype
of type T
if the set of values of type S
is a
“subset” of the values in set T
an set of operations of
type S
is a “superset” of the operations of type
T
. That is, we can safely substitute elements of
subtype S
for elements of type T
because
S
’s operations behave the “same” as T
’s
operations.
This is known as the Liskov Substitution Principle [11,19].
Consider a type representing all furniture and a type representing all chairs. In general, we consider the set of chairs to be a subset of the set of furniture. A chair should have all the general characteristics of furniture, but it may have additional characteristics specific to chairs.
If we can perform an operation on furniture in general, we should be able to perform the same operation on a chair under the same circumstances and get the same result. Of course, there may be additional operations we can perform on chairs that are not applicable to furniture in general.
Thus the type of all chairs is a subtype of the type of all furniture according to the Liskov Substitution Principle.
Now consider the types of the basic program elements.
A constant has whatever types it is defined to have in the
context in which it is used. For example, the constant symbol
1
might represent an integer, a real number, a complex
number, a single bit, etc., depending upon the context.
A variable has whatever types its value has in a particular context and at a particular time during execution. The type may be constrained by a declaration of the variable.
An expression has whatever types its evaluation yields based on the types of the variables, constants, and operations from which it is constructed.
In a statically typed language, the types of a variable or expression can be determined from the program source code and checked at “compile time” (i.e., during the syntactic and semantic processing in the front-end of a language processor). Such languages may require at least some of the types of variables or expressions to be declared explicitly, while others may be inferred implicitly from the context.
Java, Scala, and Haskell are examples of statically typed languages.
In a dynamically typed language, the specific types of a variable or expression cannot be determined at “compile time” but can be checked at runtime.
Lisp, Python, JavaScript, and Lua are examples of dynamically typed languages.
Of course, most languages use a mixture of static and dynamic typing.
For example, Java objects defined within an inheritance hierarchy must
be bound dynamically to the appropriate operations at runtime. Also Java
objects declared of type Object
(the root class of all
user-defined classes) often require explicit runtime checks or
coercions.
In a language with nominal typing, the type of value is
based on the type name assigned when the value is created. Two
values have the same type if they have the same type name. A type
S
is a subtype of type T
only if
S
is explicitly declared to be a subtype of
T
.
For example, Java is primarily a nominally typed language. It assigns types to an object based on the name of the class from which the object is instantiated and the superclasses extended and interfaces implemented by that class.
However, Java does not guarantee that subtypes satisfy the Liskov Substitution Principle. For example, a subclass might not implement an operation in a manner that is compatible with the superclass. (The behavior of subclass objects are this different from the behavior of superclass objects.) Ensuring that Java subclasses preserve the Substitution Principle is considered good programming practice in most circumstances.
In a language with structural typing, the type of a value is based on the structure of the value. Two values have the same type if they have the “same” structure; that is, they have the same public data attributes and operations and these are themselves of compatible types.
In structurally typed languages, a type S
is a subtype
of type T
only if S
has all the public data
values and operations of type T
and the data values and
operations are themselves of compatible types. Subtype S
may have additional data values and operations not in
T
.
Haskell is primarily a structurally typed language.
Polymorphism refers to the property of having “many shapes”. In programming languages, we are primarily interested in how polymorphic function names (or operator symbols) are associated with implementations of the functions (or operations).
In general, two primary kinds of polymorphism exist in programming languages:
Ad hoc polymorphism, in which the same function name (or operator symbol) can denote different implementations depending upon how it is used in an expression. That is, the implementation invoked depends upon the types of function’s arguments and return value.
There are two subkinds of ad hoc polymorphism.
Overloading refers to ad hoc polymorphism in which the language’s compiler or interpreter determines the appropriate implementation to invoke using information from the context. In statically typed languages, overloaded names and symbols can usually be bound to the intended implementation at compile time based on the declared types of the entities. They exhibit early binding.
Consider the language Java. It overloads a few operator symbols, such
as using the +
symbol for both
addition of numbers and concatenation of strings. Java also overloads
calls of functions defined with the same name but different signatures
(patterns of parameter types and return value). Java does not support
user-defined operator overloading; C++ does.
Haskell’s type class mechanism, which we examine in a later chapter, implements overloading polymorphism in Haskell. There are similar mechanisms in other languages such as Scala and Rust.
Subtyping (also known as subtype polymorphism or inclusion polymorphism) refers to ad hoc polymorphism in which the appropriate implementation is determined by searching a hierarchy of types. The function may be defined in a supertype and redefined (overridden) in subtypes. Beginning with the actual types of the data involved, the program searches up the type hierarchy to find the appropriate implementation to invoke. This usually occurs at runtime, so this exhibits late binding.
The object-oriented programming community often refers to inheritance-based subtype polymorphism as simply polymorphism. This the polymorphism associated with the class structure in Java.
Haskell does not support subtyping. Its type classes do support class extension, which enables one class to inherit the properties of another. However, Haskell’s classes are not types.
Parametric polymorphism, in which the same implementation can be used for many different types. In most cases, the function (or class) implementation is stated in terms of one or more type parameters. In statically typed languages, this binding can usually be done at compile time (i.e., exhibiting early binding).
The object-oriented programming (e.g., Java) community often calls this type of polymorphism generics or generic programming.
The functional programming (e.g., Haskell) community often calls this simply polymorphism.
A polymorphic variable is a variable that can “hold” values of different types during program execution.
For example, a variable in a dynamically typed language (e.g., Python) is polymorphic. It can potentially “hold” any value. The variable takes on the type of whatever value it “holds” at a particular point during execution.
Also, a variable in a nominally and statically typed, object-oriented language (e.g., Java) is polymorphic. It can “hold” a value its declared type or of any of the subtypes of that type. The variable is declared with a static type; its value has a dynamic type.
A variable that is a parameter of a (parametrically) polymorphic function is polymorphic. It may be bound to different types on different calls of the function.
The type system is an important part of Haskell; the compiler or interpreter uses the type information to detect errors in expressions and function definitions. To each expression Haskell assigns a type that describes the kind of value represented by the expression.
Haskell has both built-in types (defined in the language or its standard libraries) and facilities for defining new types. In the following we discuss the primary built-in types. As we have seen, a Haskell type name begins with a capital letter.
In this textbook, we sometimes refer to the types Int
, Float
, Double
, Bool
, and
Char
as
being primitive because they likely have direct support in the
host processor’s hardware.
Int
and Integer
The Int
data type
is usually an integer data type supported directly by the host processor
(e.g., 32- or 64-bits on most current processors), but it is guaranteed
to have the range of at least a 30-bit, two’s complement integer.
The type Integer
is an
unbounded precision integer type. Unlike Int
, host
processors usually do not support this type directly. The Haskell
library or runtime system typically supports this type in software.
Haskell integers support the usual literal formats (i.e., constants) and typical operations:
Infix binary operators such as +
(addition),
-
(subtraction), *
(multiplication), and ^
(exponentiation)
Infix binary comparison operators such as ==
(equality
of values), /=
(inequality
of values), <
, <=
, >
, and
>=
Unary operator -
(negation)
For integer division, Haskell provides two-argument functions:
div
such that
div m n
returns the integral quotient truncated toward negative
infinity from dividing m
by
n
quot
such that
quot m n
returns the integral quotient truncated toward 0 from dividing
m
bem n
mod
(i.e.,
modulo) and rem
(i.e., remainder) such that
div m n) * n + mod m n == m
(quot m n)* n + rem m n == m (
To make these definitions more concrete, consider the following
examples. Note that the result of mod
has the
same sign as the divisor and rem
has the
same sign as the dividend.
div 7 3 == 2
quot 7 3 == 2
mod 7 3 == 1 -- same sign as divisor
rem 7 3 == 1 -- same sign as dividend
div (-7) (-3) == 2
quot (-7) (-3) == 2
mod (-7) (-3) == (-1) -- same sign as divisor
rem (-7) (-3) == (-1) -- same sign as dividend
div (-7) 3 == (-3)
quot (-7) 3 == (-2)
mod (-7) 3 == 2 -- same sign as divisor
rem (-7) 3 == (-1) -- same sign as dividend
div 7 (-3) == (-3)
quot 7 (-3) == (-2)
mod 7 (-3) == (-2) -- same sign as divisor
rem 7 (-3) == 1 -- same sign as dividend
Haskell also provides the useful two-argument functions min
and max
, which
return the minimum and maximum of the two arguments, respectively.
Two-arguments functions such as div
, rem
, min
, and max
can be
applied in infix form by including the function name between backticks
as shown below:
5 `div` 3 -- yields 1
5 `rem` 3 -- yields 2
5 `min` 3 -- yields 3
5 `max` 3 -- yields 5
Float
and Double
The Float
and
Double
data types are usually the single and double precision floating point
numbers supported directly by the host processor.
Haskell floating point literals must include a decimal point; they
may be signed or in scientific notation: 3.14159
, 2.0
, -2.0
,
1.0e4
,
5.0e-2
,
-5.0e-2
.
Haskell supports the usual operations on floating point numbers.
Division is denoted by /
as
usual.
In addition, Haskell supports the following for converting floating point numbers to and from integers:
floor
returns
the largest integer less than its floating point argument.
ceiling
returns the smallest integer greater than its floating point
argument
truncate
returns its argumentas an integer truncated toward 0.
round
returns
it argument as an integer rounded away from 0.
fromIntegral
returns its integer argument as a floating point number in a context
where Double
or
Float
is
required. It can also return its Integer
argument as an Int
or vice
versa.
Bool
The Bool
(i.e.,
Boolean) data type is usually supported directly by the host processor
as one or more contiguous bits.
The Bool
literals
are True
and False
. Note
that these begin with capital letters.
Haskell supports Boolean operations such as &&
(and), ||
(or), and
not
(logical negation).
Functions can match against patterns using the Boolean constants. For
example, we could define a function myAnd
as follows:
myAnd :: Bool -> Bool -> Bool
True b = b
myAnd False _ = False myAnd
Above the pattern _
is a
wildcard that matches any value but does not bind a value that can be
used on the right-hand-side of the definition.
The expressions in Haskell if
conditions
and guards on function definitions must be Bool
-valued
expressions. They can include calls to functions that return Bool
values.
Char
The Char
data type
is usually supported directly by the host processor by one or more
contiguous bytes.
Haskell uses Unicode for its character data type. Haskell supports
character literals enclosed in single quotes—including both the graphic
characters (e.g., ’a’
, ’0’
, and ’Z’
) and
special codes entered following the escape character backslash \
(e.g., '\n'
for newline, '\t'
for horizontal tab, and '\\'
for backslash itself).
In addition, a backslash character \
followed by a number generates the
corresponding Unicode character code. If the first character following
the backslash is o
, then the
number is in octal representation; if followed by x
, then in hexadecimal notation; and
otherwise in decimal notation.
For example, the exclamation point character can be represented in
any of the following ways: ’!’
, '\33'
,
'\o41'
,
'\x21'
t1 -> t2
If t1
and t2
are types then t1 -> t2
is
the type of a function that takes an argument of type t1
and returns a result of type t2
.
Function and variable names begin with lowercase letters optionally
followed by a sequences of characters each of which is a letter, a
digit, an apostrophe ('
)
(sometimes pronounced “prime”), or an underscore (_
).
Haskell functions are first-class objects. They can be arguments or results of other functions or be components of data structures. Multi-argument functions are curried-–that is, treated as if they take their arguments one at a time.
For example, consider the integer addition operation (+)
.
(Surrounding the binary operator symbol with parentheses refers to the
corresponding function.) In mathematics, we normally consider addition
as an operation that takes a pair of integers and yields an
integer result, which would have the type signature
(+) :: (Int,Int) -> Int
In Haskell, we give the addition operation the type
(+) :: Int -> (Int -> Int)
or just
(+) :: Int -> Int -> Int
since Haskell binds ->
from the right.
Thus (+)
is a one
argument function that takes some Int
argument
and returns a function of type Int -> Int
.
Hence, the expression ((+) 5)
denotes a function that takes one argument and returns that argument
plus 5
.
We sometimes speak of this (+)
operation
as being partially applied (i.e., to one argument instead of
two).
This process of replacing a structured argument by a sequence of simpler ones is called currying, named after American logician Haskell B. Curry who first described it.
The Haskell library, called the standard prelude (or just Prelude), contains a wide range of predefined functions including the usual arithmetic, relational, and Boolean operations. Some of these operations are predefined as infix operations.
(t1,t2,...,tn)
If t1
, t2
,
,
tn
are types, where n
is finite and
n >= 2
, then is a type consisting of n-tuples
where the various components have the type given for that position.
Each element in a tuple may have different types. The number of elements in a tuple is fixed.
Examples of tuple values with their types include the following:
'a',1) :: (Char,Int)
(0.0,0.0,0.0) :: (Double,Double,Double)
('a',False),(3,4)) :: ((Char, Bool), (Int, Int)) ((
We can also define a type synonym using the type
declaration and the use the synonym in further declarations as
follows:
type Complex = (Float,Float)
makeComplex :: Float -> Float -> Complex
= (r,i)` makeComplex r i
A type synonym does not define a new type, but it introduces an alias
for an existing type. We can use Complex
in
declarations, but it has the same effect as using (Float,Float)
expect that Complex
provides better documentation of the intent.
[t]
The primary built-in data structure in Haskell is the list,
a sequence of values. All the elements in a list must have the same
type. Thus we declare lists with notation such as [t]
to denote a list of zero or more
elements of type t
.
A list literal is a comma-separated sequence of values enclosed
between [
and ]
. For example, []
is an empty list and [1,2,3]
is a list of the first three positive integers in increasing order.
We will look at programming with lists in a later chapter.
String
In Haskell, a string is just a list of characters. Thus
Haskell defines the data type String
as a type
synonym :
type String = [Char]
We examine lists and strings in a later chapter, but, because we use strings in a few examples in this subsection, let’s consider them briefly.
A String
literal
is a sequence of zero or more characters enclosed in double quotes, for
example, "Haskell programming"
.
Strings can contain any graphic character or any special character
given as escape code sequence (using backslash). The special escape code
\&
is used to separate any character sequences that are
otherwise ambiguous.
For example, the string literal "Hotty\nToddy!\n"
is a string that has two newline characters embedded.
Also the string literal "\12\&3"
represents the two-element list ['\12','3']
.
The function show
returns
its argument converted to a String
.
Because strings are represented as lists, all of the Prelude functions for manipulating lists also apply to strings. We look at manipulating lists and strings in later chapters of this textbook.
In later chapters, we examine other important Haskell type concepts such as user-defined algebraic data types and type classes.
In this chapter (5), we examined general type systems concepts and explored Haskell’s builtin types.
For a similar presentation of the types in the Python 3 language, see reference [5].
In Chapters 6 and 7, we examine methods for developing Haskell programs using abstraction. We explore use of top-down stepwise refinement, modular design, and other methods in the context of Haskell.
For each of the following exercises, develop and test a Haskell function or set of functions.
Develop a Haskell function sumSqBig
that takes three Double
arguments and yields the sum of the squares of the two larger
numbers.
For example, (sumSqBig 2.0 1.0 3.0)
yields 13.0
.
Develop a Haskell function prodSqSmall
that takes three Double
arguments and yields the product of the squares of the two smaller
numbers.
For example, (prodSqSmall 2.0 4.0 3.0)
yields 36.0
.
Develop a Haskell function xor
that takes two Booleans and yields
the “exclusive-or” of the two values. An exclusive-or operation yields
True
when exactly one of its arguments is True
and
yields False
otherwise.
Develop a Haskell Boolean function implies
that takes two Booleans p
and q
and yields the Boolean result p
q
(i.e., logical implication).
That is, if p
is True
and q
is False
, then
the result is False
;
otherwise, the result is True
.
Note: This function is sometimes called nand
.
Develop a Haskell Boolean function div23n5
that takes an Int
and yields
True
if
and only if the integer is divisible by 2 or divisible by 3, but is not
divisible by 5.
For example, (div23n5 4)
,
(div23n5 6)
,
and (div23n5 9)
all yield True
and (div23n5 5)
,
(div23n5 7)
,
(div23n5 10)
,
(div23n5 15)
,
(div23n5 30)
all yield False
.
Develop a Haskell function notDiv
such that notDiv n d
yields True
if and
only if integer n
is not
divisible by d
.
For example, (notDiv 10 5)
yields False
and
(notDiv 11 5)
yields True
.
Develop a Haskell function ccArea
that takes the
diameters of two concentric circles (i.e., with the same center
point) as Double
values
and yields the area of the space between the circles. That is, compute
the area of the larger circle minus the area of the smaller circle.
(Hint: Haskell has a builtin constant pi
.)
For example, (ccArea 2.0 4.0)
yields approximately 9.42477796
.
Develop a Haskell function mult
that takes two natural
numbers (i.e., nonnegative integers in Int
) and
yields their product. The function must not use the multiplication
(*
) or
division (div
)
operators. (Hint: Multiplication can be done by repeated
addition.)
Develop a Haskell function addTax
that takes two Double
values
such that addTax c p
yield c
with a sales tax of p
percent added.
For example, (addTax 2.0 9.0)
yields 2.18
.
Also develop a function subTax
that is the inverse of addTax
. That is, (subTax (addTax c p) p)
yields c
.
For example, (subTax 2.18 9.0) = 2.0
.
The time of day can be represented by a tuple (hours,minutes,m)
where hours
and minutes
are Int
values
with 1 <= hours <= 12
and 0 <= minutes <= 59
,
and where m
is either the string
value "AM"
or "PM"
.
Develop a Boolean Haskell function comesBefore
that takes two time-of-day
tuples and determines whether the first is an earlier time than the
second.
A day on the calendar (usual Gregorian calendar [25] used in
the USA) can be represented as a tuple with three Int
values
(month,day,year)
where the year
is a positive integer, 1 <= month <= 12
,
and 1 <= day <= days_in_month
.
Here days_in_month
is the number
of days in the the given month
(i.e., 28, 29, 30, or 31) for the given year
.
Develop a Boolean Haskell function validDate d
that takes a date tuple
d
and yields True
if and
only if d
represents a valid
date.
For example, validDate (8,20,2018)
and validDate (2,29,2016)
yield True
and validDate (2,29,2017)
and validDate (0,0,0)
yield False
.
Note: The Gregorian calendar [25] was introduced by Pope Gregory of the Roman Catholic Church in October 1582. It replaced the Julian calendar system, which had been instituted in the Roman Empire by Julius Caesar in 46 BC. The goal of the change was to align the calendar year with the astronomical year.
Some countries adopted the Gregorian calendar at that time. Other countries adopted it later. Some countries may never have adopted it officially.
However, the Gregorian calendar system became the common calendar used worldwide for most civil matters. The proleptic Gregorian calendar [26] extends the calendar backward in time from 1582. The year 1 BC becomes year 0, 2 BC becomes year -1, etc. The proleptic Gregorian calendar underlies the ISO 8601 standard used for dates and times in software systems [27].
Develop a Haskell function roman
that takes an Int
) in the
range from 0 to 3999 (inclusive) and yields the corresponding Roman
numeral [28] as a string (using capital
letters). The function should halt with an appropriate error
messages
if the argument is below or above the range. Roman numerals use the
symbols shown in Table 5.1 and are
combined by addition or subtraction of symbols.
Roman | Decimal | |
---|---|---|
I | 1 | |
V | 5 | |
X | 10 | |
L | 50 | |
C | 100 | |
D | 500 | |
M | 1000 |
For the purposes of this exercise, we represent the Roman numeral
for 0 as the empty string. The Roman numerals for integers 1-20 are
I
, II
, III
, IV
,
V
, VI
, VII
, VIII
,
IX
, X
, XI
, XII
,
XIII
, XIV
, XV
, XVI
,
XVII
, XVII
, XIX
, and
XX
. Integers 40, 90, 400, and 900 are XL
,
XC
, CD
, and CM
.
Develop a Haskell function
minf :: (Int -> Int) -> Int
that takes a function g
and
yields the smallest integer m
such that 0 <= m <= 10000000
and g m == 0
.
It should throw an error
if there
is no such integer.
In Summer 2016, I adapted and revised the discussion Surveying the Basic Types from chapter 5 of my Notes on Functional Programming with Haskell [4]. In 2017, I incorporated the discussion into Section 2.4 in Chapter 2 Basic Haskell Functional Programming of my 2017 Haskell-based programming languages textbook.
In Spring and Summer 2018, I divided the previous Basic Haskell Functional Programming chapter into four chapters in the 2018 version of the textbook, now titled Exploring Languages with Interpreters and Functional Programming [6]. Previous sections 2.1-2.3 became the basis for new Chapter 4, First Haskell Programs; previous Section 2.4 became Section 5.3 in the new Chapter 5, Types (this chapter); and previous sections 2.5-2.7 were reorganized into new Chapter 6, Procedural Abstraction, and Chapter 7, Data Abstraction.
In Spring 2018, I wrote the general Type System Concepts section as a part of a chapter that discusses the type system of Python 3 [5] to support my use of Python in graduate CSci 658 (Software Language Engineering) course.
In Summer 2018, I revised the section to become Section 5.2 in Chapter 5 of the Fall 2018 version of ELIFP [6]. I also moved the “Kinds of Polymorphism” discussion from the 2017 List Programming chapter to the new subsection “Polymorphic Operations”. This textbook draft supported my Haskell-based offering of the core course CSci 450 (Organization of Programming Languages).
The type concepts discussion draws ideas from various sources:
my general study of a variety of programming, programming language, and software engineering over three decades [1–3,7–18].
the Wikipedia articles on the Liskov Substitution Principle [19], Polymorphism [20], Ad Hoc Polymorphism [22], Parametric Polymorphism [23], Subtyping [24], and Function Overloading [21]
I retired from the full-time faculty in May 2019. As one of my post-retirement projects, I am continuing work on this textbook. In January 2022, I began refining the existing content, integrating additional separately developed materials, reformatting the document (e.g., using CSS), constructing a bibliography (e.g., using citeproc), and improving the build workflow and use of Pandoc.
In 2022, I also added some discussion on the functions div
, quot
, mod
, rem
, fromIntegral
,
and show
because of their usefulness in the exercises in this and later
chapters.
I maintain this chapter as text in Pandoc’s dialect of Markdown using embedded LaTeX markup for the mathematical formulas and then translate the document to HTML, PDF, and other forms as needed.
Type, subtype, Liskov Substitution Principle, types of constants,
variables, and expressions, static vs. dynamic types, nominal
vs. structural types, polymorphic operations (ad hoc, overloading,
subtyping, parametric/generic), early vs. late binding, compile time
vs. runtime, polymorphic variables, basic Haskell types (Int
, Integer
, Bool
, Char
,
functions, tuples, lists, String
), type
aliases,
library (Prelude) functions, proleptic Gregorian calendar system, Roman
numerals.