Here we examine some of the more advanced aspects of type declarations.
A common programming practice is to define a type whose representation
is identical to an existing one but which has a separate identity in
the type system.
In Haskell, the newtype declaration creates a new type from an
existing one. For example, natural numbers can be represented by
the type Integer using the following declaration:
newtype Natural = MakeNatural Integer
This creates an entirely new type, Natural, whose only
constructor contains a single Integer. The constructor MakeNatural
converts between an Natural and an Integer:
toNatural :: Integer -> Natural
toNatural x | x < 0 = error "Can't create negative naturals!"
| otherwise = MakeNatural x
fromNatural :: Natural -> Integer
fromNatural (MakeNatural i) = i
The
following instance declaration admits Natural to the Num class:
instance Num Natural where
fromInteger = toNatural
x + y = toNatural (fromNatural x + fromNatural y)
x - y = let r = fromNatural x - fromNatural y in
if r < 0 then error "Unnatural subtraction"
else toNatural r
x * y = toNatural (fromNatural x * fromNatural y)
Without this declaration, Natural would not be in Num. Instances
declared for the old type do not carry over to the new one. Indeed,
the whole purpose of this type is to introduce a different Num
instance. This would not be possible if Natural were
defined as a type synonym of Integer.
All of this works using a data declaration instead of a newtype declaration. However, the data declaration incurs extra overhead in the representation of Natural values. The use of newtype avoids the extra level of indirection (caused by laziness) that the data declaration would introduce. See section 4.2.3 of the report for a more discussion of the relation between newtype, data, and type declarations. [Except for the keyword, the newtype declaration uses the same syntax as a data declaration with a single constructor containing a single field. This is appropriate since types defined using newtype are nearly identical to those created by an ordinary data declaration.]
The fields within a Haskell data type can be accessed either
positionally or by name using field labels.
Consider a data type for a two-dimensional point:
data Point = Pt Float Float
The two components of a Point are the first and second arguments to the
constructor Pt. A function such as
pointx :: Point -> Float
pointx (Pt x _) = x
may be used to refer to the first component of a point in a more
descriptive way, but, for large structures, it becomes tedious to
create such functions by hand.
Constructors in a data declaration may be declared
with associated field names, enclosed in braces. These field names
identify the components of constructor by name rather than by position.
This is an alternative way to define Point:
data Point = Pt {pointx, pointy :: Float}
This data type is identical to the earlier definition
of Point. The constructor Pt is the same in both cases. However,
this declaration also defines two field names, pointx
and pointy. These field names can be used as selector functions to
extract a component from a structure. In this example, the selectors
are:
pointx :: Point -> Float
pointy :: Point -> Float
This is a function using these selectors:
absPoint :: Point -> Float
absPoint p = sqrt (pointx p * pointx p +
pointy p * pointy p)
Field labels can also be used to construct new values. The expression Pt {pointx=1, pointy=2} is identical to Pt 1 2. The use of field names in the declaration of a data constructor does not preclude the positional style of field access; both Pt {pointx=1, pointy=2} and Pt 1 2 are allowed. When constructing a value using field names, some fields may be omitted; these absent fields are undefined.
Pattern matching using field names uses a similar syntax for the
constructor Pt:
absPoint (Pt {pointx = x, pointy = y}) = sqrt (x*x + y*y)
An update function uses field values in an existing structure to fill in components of a new structure. If p is a Point, then p {pointx=2} is a point with the same pointy as p but with pointx replaced by 2. This is not a destructive update: the update function merely creates a new copy of the object, filling in the specified fields with new values.
[The braces used in conjunction with field labels are somewhat special: Haskell syntax usually allows braces to be omitted using the layout rule (described in Section 4.6). However, the braces associated with field names must be explicit.]
Field names are not restricted to types with a single constructor (commonly called `record' types). In a type with multiple constructors, selection or update operations using field names may fail at runtime. This is similar to the behavior of the head function when applied to an empty list.
Field labels share the top level namespace with ordinary variables and
class methods.
A field name cannot be used in more than one data type in scope.
However, within a data type, the same field
name can be used in more than one of the constructors so long as it
has the same typing in all cases. For example, in this data type
data T = C1 {f :: Int, g :: Float}
| C2 {f :: Int, h :: Bool}
the field name f applies to both constructors in T. Thus if
x is of type T, then x {f=5} will work for values created by
either of the constructors in T.
Field names does not change the basic nature of an algebraic data type; they are simply a convenient syntax for accessing the components of a data structure by name rather than by position. They make constructors with many components more manageable since fields can be added or removed without changing every reference to the constructor. For full details of field labels and their semantics, see Section §4.2.1.
Internally, each field of a lazy data object is wrapped up in a structure commonly referred to as a thunk that encapsulates the computation defining the field value. This thunk is not entered until the value is needed; thunks which contain errors (_|_) do not affect other elements of a data structure. For example, the tuple ('a',_|_) is a perfectly legal Haskell value. The 'a' may be used without disturbing the other component of the tuple. Most programming languages are strict instead of lazy: that is, all components of a data structure are reduced to values before being placed in the structure.
There are a number of overheads associated with thunks: they take time to construct and evaluate, they occupy space in the heap, and they cause the garbage collector to retain other structures needed for the evaluation of the thunk. To avoid these overheads, strictness flags in data declarations allow specific fields of a constructor to be evaluated immediately, selectively suppressing laziness. A field marked by ! in a data declaration is evaluated when the structure is created instead of delayed in a thunk. There are a number of situations where it may be appropriate to use strictness flags:
Strictness flags may be used to address memory leaks: structures retained by the garbage collector but no longer necessary for computation.
The strictness flag, !, can only appear in data declarations. It cannot be used in other type signatures or in any other type definitions. There is no corresponding way to mark function arguments as being strict, although the same effect can be obtained using the seq or !$ functions. See §4.2.1 for further details.
It is difficult to present exact guidelines for the use of strictness flags. They should be used with caution: laziness is one of the fundamental properties of Haskell and adding strictness flags may lead to hard to find infinite loops or have other unexpected consequences.