Exploring Languages

with Interpreters

and Functional Programming

Chapter 42

**H. Conrad Cunningham**

**04 April 2022**

Copyright (C) 2017, 2018, 2022, H. Conrad Cunningham

Professor of Computer and Information Science

University of Mississippi

214 Weir Hall

P.O. Box 1848

University, MS 38677

(662) 915-7396 (dept. office)

Professor of Computer and Information Science

University of Mississippi

214 Weir Hall

P.O. Box 1848

University, MS 38677

(662) 915-7396 (dept. office)

**Browser Advisory:** The HTML version of this textbook
requires a browser that supports the display of MathML. A good choice as
of April 2022 is a recent version of Firefox from Mozilla.

Chapter 41 introduced formal concepts related to concrete syntax and gave two different concrete syntaxes for the ELI Calculator language.

This chapter (42) introduces the concepts related to abstract syntax and language semantics. It encodes the essential structure of any ELI Calculator expression as a Haskell algebraic data type and defines the semantics operationally using a Haskell evaluation function. The abstract syntax also enables the expression to be transformed in various ways, such as converting it to a simpler expression while maintaining an equivalent value.

TODO: Rethink statement of goals below.

The goals of this chapter are to:

explore the concepts of abstract syntax, abstract syntax trees, and expression evaluation

define the semantics of the ELI Calculator language by designing its abstract syntax and an evaluation function

examine techniques for abstract syntax tree simplification and manipulation

The *abstract syntax* of an expression seeks to represent only
the essential aspects of the expression’s structure, ignoring
nonessential, representation-dependent details of the concrete syntax
[1,2].

For example, parentheses represent structural details in the concrete syntaxes given in Chapter 41. This structural information can be represented directly in the abstract syntax; there is no need for parentheses to appear in the abstract syntax.

We can represent arithmetic expressions conveniently using a tree
data structure, where the nodes represent operations (e.g., addition)
and leaves represent values (e.g., constants or variables). This
representation is called a *abstract syntax tree* (AST) for the
expression [1,3].

In Haskell, we can represent an abstract syntax trees using algebraic data types. Such types often enable us to express programs concisely by using pattern matching.

For the ELI Calculator language, we define the `Expr`

algebraic
data type—in the Abstract Syntax module (`AbSynCalc`

)—to
describe the abstract syntax tree.

```
import Values ( ValType, Name )
data Expr = Add Expr Expr
| Sub Expr Expr
| Mul Expr Expr
| Div Expr Expr
| Var Name
| Val ValType
-- deriving Show?
instance Show Expr where
show (Val v) = show v
show (Var n) = n
show (Add l r) = showParExpr "+" [l,r]
show (Sub l r) = showParExpr "-" [l,r]
show (Mul l r) = showParExpr "+" [l,r]
show (Div l r) = showParExpr "/" [l,r]
showParExpr :: String -> [Expr] -> String
=
showParExpr op es "(" ++ op ++ " " ++ showExprList es ++ ")"
showExprList :: [Expr] -> String
= Data.List.intercalate " " (map show es) showExprList es
```

Above in type `Expr`

, the
constructors `Add`

, `Sub`

, `Mul`

, and `Div`

represent
the addition, subtraction, multiplication, and division, respectively,
of the two operand subexpressions, `Var`

represents a
variable with a name, and `Val`

represents
a constant value.

Note that this abstract syntax is similar to the (Lisp-like) parenthesized prefix syntax described in Chapter 41.

We make type `Expr`

an
instance of class `Show`

. We do
not derive or define an instance of the `Eq`

class
because direct structural equality of trees may not be how we want to
define equality comparisons.

We can thus express the example expressions from the Concrete Syntax chapter as follows:

```
Val 3 -- 3
Val (-3) -- -3
Var "x" -- x
Add (Val 1) (Val 1) -- 1+1
Add (Var "x") (Val 3) -- x + 3
-- (x + y) * (2 - z)
Mul (Add (Var "x") (Var "y")) (Sub (Val 2) (Var "z"))
```

Figures 42.1 and 42.2 show abstract syntax trees for two example expressions above.

In Chapter 44 on parsing, we
develop parsers for both the prefix and infix syntaxes. Both parsers
construct abstract syntax trees using the algebraic data type `Expr`

.

The ELI Calculator language restricts values to `ValType`

. The
`Values`

module indirectly defines this type synonym to be `Int`

.

The abstract syntax allows a name to be represented by any string
(i.e., type alias `Name`

, which is
defined to be `String`

in the
`Values`

module). We likely want to restrict names to follow the usual
“identifier” syntax. The parser for the concrete syntax should enforce
this restriction. Or we could define Haskell functions to parse and
construct identifiers, such as the functions below.

```
import Data.Char ( isAlpha, isAlphaNum )
getId :: String -> (Name,String)
= ([],[])
getId [] @(x:_)
getId xs| isFirstId x = span isRestId xs
| otherwise = ([],xs)
where
= isAlpha c || c == '_'
isFirstId c = isAlphaNum c || c == '_'
isRestId c
identifier :: String -> Maybe Name
=
identifier xs case getId xs of
@(_:_),[]) -> Just xs
(xsotherwise -> Nothing
```

The `getId`

function takes a
string and parses an identifier at the beginning of the string. A valid
identifier must begin with an alphabetic or underscore character and
continue with zero or more alphabetic, numeric, or underscore
characters.

The `getId`

function uses the
higher order function `span`

to
collect the characters that form the identifier. This function takes a
predicate and returns a pair, of which the first component is the prefix
string satisfying the predicate and the second is the remaining
string.

In Chapter 44, we examine how to parse an expression’s concrete syntax to build an abstract syntax tree.

In language processing, we often need to associate some key (e.g., a
variable name) with its value. There are several names for this type of
data structure—*associative
array* [4], *dictionary*,
*map*, *symbol table*, etc.

As we saw in Chapter 21, an
*association list* is a simple list-based implementation of this
concept [4]. It is a list of pairs in
which the first component is the *key* (e.g., a string) and the
second component is the *value* associated with the key.

The Prelude function `lookup`

, shown
below (and in Chapter 21), searches
an association list for a key and returns a `Maybe`

value.
If it finds the key, it wraps the associated value in a `Just`

; if it
does not find the key, it returns a `Nothing`

.

```
lookup :: (Eq a) => a -> [(a,b)] -> Maybe b
lookup _ [] = Nothing
lookup key ((x,y):xys)
| key == x = Just y
| otherwise = lookup key xys
```

For better performance with larger dictionaries, we can replace an
association list by a more efficient data structure such as a `Data.Map.Map`

.
This structure implements the dictionary structure as a size-balanced
tree. It provides a `lookup`

function with essentially the same interface.

Of course, imperative languages might use a mutable *hash
table* to implement a dictionary.

Consider the evaluation of the ELI Calculator language abstract syntax trees as defined above.

To evaluate an expression, we must determine the current value of
each variable occurring in the expression. That is, we must evaluate the
expression in some *environment* that associates the variable
names with their values.

For example, consider the expression `x + 3`

. It might be
evaluated in an environment that associates the value `5`

with the variable `x`

, written `{ x -> 5 }`

.
The evaluation of this expression yields the value `8`

.

The environment `{ x -> 5 }`

can be expressed in a
number of ways in Haskell. Here we choose to represent it as a simple
association list as follows:

`"x",5)] [(`

This list associates a variable name in the first component with its integer value in the second component.

Looking up a key in an association list is an O(`n`

)
operation where `n`

denotes the number of key-value
pairs.

As noted above, a good alternative to the association list is a `Map`

from the
`Data.Map`

library. It implements the dictionary as an immutable, size-balanced
tree, thus its `lookup`

function is an O(`log2 n`

) operation.

In the ELI Calculator language implementation, we encapsulate the
representation of the environment in the `Environments`

module. This module
exports the following type synonym and functions:

```
type AnEnv a =[(Name,a)]
newEnv :: AnEnv a
toList :: AnEnv a -> [(Name,a)]
getBinding :: Name -> AnEnv a -> Maybe a
hasBinding :: Name -> AnEnv a -> Bool
newBinding :: Name -> a -> AnEnv a -> AnEnv a
setBinding :: Name -> a -> AnEnv a -> AnEnv a
bindList :: [(Name,a)] -> AnEnv a -> AnEnv a
```

For the purposes of our evaluation program, we can then define a
specific environment with the type synonym `Env`

in the Evaluator (`EvalCalc`

)
module as follows:

```
import Values ( ValType, Name, defaultVal )
import AbSynExpr ( Expr(..) )
import Environments ( AnEnv, Name, newEnv, toList, getBinding,
hasBinding, newBinding, setBinding,
bindList )
type Env = AnEnv ValType
```

We express the *semantics* (i.e., meaning) of the various ELI
Calculator language expressions (i.e., nodes of the AST) as follows.

`c`

evaluates to the constant (`NumType`

) value`c`

.`Var n`

evaluates to the value of variable`n`

in the environment, generating an error if the variable is not defined.`Add l r`

evaluates to the sum of the values of the expression trees`l`

and`r`

.`Sub l r`

evaluates to the difference between the values of the expression trees`l`

and`r`

.`Mul l r`

evaluates to the product of the values of the expression trees`l`

and`r`

.`Div l r`

evaluates to the quotient of the values of the expression trees`l`

and`r`

. Division by zero is not defined.

Operations `Add`

, `Sub`

, `Mul`

, and `Div`

are
*strict*. They are undefined if any of their subexpressions are
undefined.

We can thus define a Haskell *evaluation function* (i.e.,
*interpreter*) for the ELI Calculator language as follows.

This function in the Evaluator
module (`EvalCalc`

) does
a *post-order traversal* of the abstract syntax tree, first
computing the values of the child subexpressions and then computing the
value of of a node. The value is returned wrapped in an `Either`

, where
the `Left`

constructor represents an error message and the `Right`

constructor a good value.

```
import Values ( ValType, Name, defaultVal )
import AbSynExpr ( Expr(..) )
import Environments ( AnEnv, Name, newEnv, toList, getBinding,
hasBinding, newBinding, setBinding,
bindList )type EvalErr = String
type Env = AnEnv ValType
eval :: Expr -> Env -> Either EvalErr ValType
Val v) _ = Right v
eval (Var n) env =
eval (case getBinding n env of
Nothing -> Left ("Undefined variable " ++ n)
Just i -> Right i
Add l r) env =
eval (case (eval l env, eval r env) of
Right lv, Right rv) -> Right (lv + rv)
(Left le, Left re ) -> Left (le ++ "\n" ++ re)
(@(Left le), _ ) -> x
(x@(Left le)) -> y
(_, ySub l r) env =
eval (case (eval l env, eval r env) of
Right lv, Right rv) -> Right (lv - rv)
(Left le, Left re ) -> Left (le ++ "\n" ++ re)
(@(Left le), _ ) -> x
(x@(Left le)) -> y
(_, yMul l r) env =
eval (case (eval l env, eval r env) of
Right lv, Right rv) -> Right (lv * rv)
(Left le, Left re ) -> Left (le ++ "\n" ++ re)
(@(Left le), _ ) -> x
(x@(Left le)) -> y
(_, yDiv l r) env =
eval (case (eval l env, eval r env) of
Right _, Right 0 ) -> Left "Division by 0"
(Right lv, Right rv) -> Right (lv `div` rv)
(Left le, Left re ) -> Left (le ++ "\n" ++ re)
(@(Left le), _ ) -> x
(x@(Left le)) -> y (_, y
```

Consider an example with a simple `main`

function below (that could be
added to the `EvalExpr`

module) that evaluates the example expressions from a previous section.
(See the extended
Evaluator module (`EvalCalcExt`

).)

```
=
main do
let env = [("x",5), ("y",7),("z",1)]
let exp1 = Val 3 -- 3
let exp2 = Var "x" -- x
let exp3 = Add (Val 1) (Val 2) -- 1+2
let exp4 = Add (Var "x") (Val 3) -- x + 3
let exp5 = Mul (Add (Var "x") (Var "y"))
Add (Val 2) (Var "z")) -- (x + y) * (2 + z)
(putStrLn ("Expression: " ++ show exp1)
putStrLn ("Evaluation with x=5, y=7, z=1: "
++ show (eval exp1 env))
putStrLn ("Expression: " ++ show exp2)
putStrLn ("Evaluation with x=5, y=7, z=1: "
++ show (eval exp2 env))
putStrLn ("Expression: " ++ show exp3)
putStrLn ("Evaluation with x=5, y=7, z=1: "
++ show (eval exp3 env))
putStrLn ("Expression: " ++ show exp4)
putStrLn ("Evaluation with x=5, y=7, z=1: "
++ show (eval exp4 env))
putStrLn ("Expression: " ++ show exp5)
putStrLn ("Evaluation with x=5, y=7, z=1: "
++ show (eval exp5 env))
```

When `main`

is called, it first
computes he values of the various expressions in the environment
`{ x -> 5, y -> 7 }`

and then prints their results.

```
Expression: 3
Evaluation with x=5, y=7, z=1: Right 3
Expression: x
Evaluation with x=5, y=7, z=1: Right 5
Expression: (+ 1 2)
Evaluation with x=5, y=7, z=1: Right 3
Expression: (+ x 3)
Evaluation with x=5, y=7, z=1: Right 8
Expression: (* (+ x y) (+ 2 z))
Evaluation with x=5, y=7, z=1: Right 36
```

TODO: Should the discussion of Simplification and Differentiation be in the main line of the chapter or separated into a project (or projects) with exercises? Simplfication is related to the global

An expression may be more complex than necessary. We can
*simplify* it, perhaps with the intention of *optimizing*
its evaluation.

An operation whose operands are constants can be simplified by
replacing it by the appropriate constant. For example, `Add (Val 3) (Val 4)`

is the same semantically as `Val 7`

.

Similarly, we can take advantages of an operation’s identity element
and other mathematical properties to simplify expressions. For example,
`Add (Val 0) (Var "x")`

is the same as `Var "x"`

.

We can thus define a skeletal function `simplify`

as follows. As with `eval`

, the `simplify`

function traverses the
abstract syntax tree using a post-order traversal.

```
simplify :: Expr -> Expr
Add l r) =
simplify (case (simplify l, simplify r) of
Val 0, rr) -> rr
(Val 0) -> ll
(ll, Val x, Val y) -> Val (x+y)
(-> Add ll rr
(ll, rr) Mul l r) =
simplify (case (simplify l, simplify r) of
Val 0, rr) -> Val 0
(Val 0) -> Val 0
(ll, Val 1, rr) -> rr
(Val 1) -> ll
(ll, Val x, Val y) -> Val (x*y)
(-> Mul ll rr
(ll, rr) @(Var _) = t
simplify t@(Val _) = t simplify t
```

In an exercise, you are asked to complete the development of this function.

See the incomplete Process AST module (`ProcessAST`

)
for the sample code in this section and the next one.

Suppose that we redefine the `Expr`

type to
support double precision floating point (i.e., `Double`

)
values.

Then let’s consider *symbolic differentiation* of the
arithmetic expressions. Thinking back to our study of differential
calculus, we identify the following rules for differentiation:

The derivative of a sum is the sum of the derivatives.

The derivative of a product of two operands is the sum of the product of (a) the first operand and the derivative of the second and (b) the second operand and the derivative of the first.

The derivative of some variable

`v`

is 1 if differentiation is relative to`v`

and is 0 otherwise.The derivative of a constant is 0.

We can directly translate these rules into a skeletal Haskell function that uses the above data types, as follows:

```
deriv :: Expr -> Name -> Expr
Add l r) v = Add (deriv l v) (deriv r v)
deriv (Mul l r) v = Add (Mul l (deriv r v)) (Mul r (deriv l v))
deriv (Var n) v
deriv (| v == n = Val 1
= Val 0 deriv _ _
```

See the incomplete Process AST module (`ProcessAST`

)
for the sample code in this section.

Chapter 41 presented concrete syntax concepts, illustrating them with two different concrete syntaxes for the ELI Calculator language.

This chapter (42) presented abstract syntax trees as structures for representing the essential features of the syntax in a form that can be evaluated directly. The same abstract syntax can encode either of the two concrete syntaxes for the ELI Calculator language.

Chapter 44 introduces lexical analysis and parsing as techniques for processing concrete syntax expressions to generate the equivalent abstract syntax trees.

Before we look at parsing, let’s examine the overall modular structure of the ELI Calculator language interpreter in Chapter 43.

This chapter involves several of the ELI Calculator language modules:

Abstract Syntax module (

`AbSynCalc`

)—to describe the abstract syntax tree.extended Evaluator module (

`EvalCalcExt`

).)

It also has the incomplete Process AST module (`ProcessAST`

)
related to the simplification and differentiation discussion and
exercises.

Extend the abstract syntax tree data type

`Expr`

, which is defined in the Abstract Syntax module (`AbSynCalc`

), to add new operations`Neg`

(negation),`Min`

(minimum),`Max`

(maximum), and`Exp`

(exponentiation).`data Expr = ... | Neg Expr | Min Expr Expr | Max Expr Expr | Exp Expr Expr ... deriving Show`

Then extend the

`eval`

function, which is defined in the Evaluator module (`EvalCalc`

), to add these new operations with the following informal semantics:`Neg e`

negates the value of expression`e`

. For example,`Neg (Val 1)`

yields`(Val (-1))`

.`Min l r`

yields the smaller value of expression`l`

and expression`r`

.`Max l r`

yields the larger value of expression`l`

and`r`

.`Exp l r`

raises the value of expression`l`

to a power that is the value of expression`r`

. It is undefined for a negative exponent value`r`

.

These operations are all strict; they only have values if all their subexpressions also have values.

Extend the

`simplify`

function to support operations`Sub`

and`Div`

and the new operations given in the previous exercise.This function should simplify the abstract syntax tree by evaluating subexpressions involving only constants (not evaluating variables) and handling special values like identity and zero elements.

Extend the

`simplify`

function from the previous exercise in other ways. For example, take advantage of mathematical properties such as*associativity*(`(x + y) + z = x + (y + z)`

),*commutativity*(`x + 1 = 1 + x`

), and*idempotence*(`x min x = x`

).Extend the abstract syntax tree data type

`Expr`

to include the binary operators`Eq`

(equality) and`Lt`

(less-than comparison), logical unary operator`Not`

, and the ternary conditional expression`If`

(if-then-else).`data Expr = ... | Eq Expr Expr | Lt Expr Expr | Not Expr | If Expr Expr Expr ... deriving Show`

Then extend the

`eval`

function to implement these new operations.This extended language does not have Boolean values. We represent “false” by integer 0 and “true” by a nonzero integer, canonically by 1.

We can express the informal semantics of the new ELI Calculator language expressions as follows:

`Eq l r`

yields the value 1 if expressions`l`

and`r`

have the same value; it yields the value 0 if`l`

and`r`

have different values.`Lt l r`

yields the value 1 if the value of expression`l`

is smaller than the value of expression`r`

; it yields the value 0 if`l`

is greater than or equal to`r`

.`Not i`

yields 1 if the value of expression`i`

is 0; it yields the value 0 if`i`

is nonzero.`If c l r`

first evaluates expression`c`

. If`c`

has a nonzero value, the`If`

yields the value of expression`l`

. If`c`

has value 0, the`If`

yields the value of expression`r`

.

Operations

`Eq`

,`Lt`

, and`Not`

are strict for all subexpressions; that is, they are undefined if any subexpression is undefined.Operation

`If`

is strict in its first subexpression`c`

.Note: The constants

`falseVal`

and`trueVal`

and the functions`boolToVal`

and`valToBool`

in the`Values`

module may be helpful. (The intention of the`Values`

module is to keep the representation of the values hidden from the rest of the interpreter. In particular, these constants and functions these are to help encapsulate the representation of booleans as the underlying values.)Extend the abstract syntax tree data type

`Expr`

from the previous exercise (which defines operator`If`

) to include a`Switch`

expression.`data Expr = ... | Switch Expr Expr [Expr] ... deriving Show`

Then extend the

`eval`

function to implement this new operation.We can express the informal semantics of this new ELI Calculator language expression as follows:

`Switch n def exs`

first evaluates expression`n`

. If the value of`n`

is greater than or equal to 0 and less than`length exs`

, then the`Switch`

yields the value of the`n`

th expression in list`exs`

(where the first element is at index 0). Otherwise, the`Switch`

yields the value of the default expression`def`

.

Develop an object-oriented program (e.g., in Java) to carry out the same functionality as the

`Expr`

data type and`eval`

function described in this chapter. That is, define a class hierarchy that corresponds to the`Expr`

data type and use the message-passing style to implement the needed classes and instances.Extend the object-oriented program from the previous exercise to the

`Neg`

,`Min`

,`Max`

, and`Exp`

as described in an earlier exercise.Extend the object-oriented program from the previous exercise to implement the

`Eq`

,`Lt`

,`Not`

, and`If`

as described in another earlier exercise.Extend the object-oriented program above to implement simplification.

For this exercise, redefine the

`Expr`

data type above to hold`Double`

constants instead of`Int`

. In addition to`Add`

,`Mul`

,`Sub`

,`Div`

,`Neg`

,`Min`

,`Max`

, and`Exp`

, extend the data type and`eval`

function to include the trigonometric operators`Sin`

and`Cos`

for sine and cosine.Using the extended

`Double`

version of`Expr`

from the previous exercise, extend function`deriv`

to support all the operators in the data type.

For the general acknowledgements for the ELI Calculator case study and Chapters 41-46 through Spring 2019, see the Acknowledgements section of Chapter 41.

I retired from the full-time faculty in May 2019. As one of my post-retirement projects, I am continuing work on this textbook. In January 2022, I began refining the existing content, integrating additional separately developed materials, reformatting the document (e.g., using CSS), constructing a unified bibliography (e.g., using citeproc), and improving the build workflow and use of Pandoc.

I maintain this chapter as text in Pandoc’s dialect of Markdown using embedded LaTeX markup for the mathematical formulas and then translate the document to HTML, PDF, and other forms as needed.

Abstract syntax, abstract syntax tree (AST), associative data structure, environment, value, semantics, evaluation function, interpreter, simplification, optimization, symbolic differentiation, associativity, commutativity (symmetry), idempotence.

[1]

Peter Sestoft. 2012. *Programming language
concepts* (First ed.). Springer, London, UK.

[2]

Wikpedia: The Free Encyclopedia. 2022. Abstract
syntax. Retrieved from https://en.wikipedia.org/wiki/Abstract_syntax

[3]

Wikpedia: The Free Encyclopedia. 2022. Abstract
syntax tree. Retrieved from https://en.wikipedia.org/wiki/Abstract_syntax_tree

[4]

Wikpedia: The Free Encyclopedia. 2022.
Association list. Retrieved from https://en.wikipedia.org/wiki/Association_list