7 November 2017
Copyright (C) 2017, H. Conrad Cunningham
Advisory: The HTML version of this document may require use of a browser that supports the display of MathML. A good choice as of November 2017 is a recent version of Firefox from Mozilla.
TODO:
The case study in this and the following two chapters (Chapters 10-12) examines how we can represent and process simple arithmetic expressions using Haskell. We call the language used in this case study the Expression Language.
Chapter 10 examines two different concrete syntaxes for expressions written as text and an abstract syntax represented as an algebraic data type. Chapter 11 considers the parsing of the syntaxes.
Chapter 10 explores direct interpretation (evaluation) of the abstract syntax trees (ASTs) and manipulation of the trees to simplify or differentiate the expressions.
Chapter 12 looks at a simple Stack Virtual Machine with an instruction set represented as another algebraic data type and how to translate (i.e., compile), how to execute the machine, and how to translate the abstract syntax trees to sequences of instructions.
We will extend it with other features in later chapters.
TODO: Add outcomes
The Expression Language can be represented in human-readable text strings in forms similar to traditional mathematical and programming notations. The structure of these textual expressions is called the concrete syntax of the expressions.
In this case study, we examine two possible concrete syntaxes: a familiar infix syntax and a (probably less familiar) parenthesized prefix syntax.
But, first, let’s consider how we can describe the syntax of a language.
We usually describe the syntax of a language using a grammar.
Formally, a grammar consists of a tuple , where
Production rules describe how the grammar transforms one sequence of symbols to another. The rules have the general form
where and are sequences of symbols from such that has length of at least one symbol.
A sentence in a language consists of any finite sequence of terminal symbols that can be generated from the start symbol of a grammar by a finite sequence of productions from the grammar.
We call a sequence of productions that generates a sentence a derivation for that sentence.
Any intermediate sequence of symbols in a derivation is called a sentential form.
The language generated by the grammar is the set of all sentences that can be generated by the grammar.
To express the syntax of programming languages, we normally restrict ourselves to the family of context-free grammars (and its subfamilies). In a context-free grammar (CFG), the production rules have the form
where and is a sequence of zero or more symbols from . This means that an occurence of nonterminal can be replaced by the sequence .
We often express a grammar using a metalanguage such as the Backus-Naur Form (BNF).
For example, consider the following BNF description of a grammar for the unsigned binary integers:
<binary> ::= <digit>
<binary> ::= <digit> <binary>
<digit> ::= '0'
<digit> ::= '1'
The nonterminals are the symbols shown in angle brackets: <binary>
and <digit>
.
The terminals are the symbols shown in single quotes: '0'
and '1'
.
The production rules are shown with a nonterminal on the left side of the metasymbol ::=
and its replacement sequence of nonterminal and terminal symbols on the right side.
Unless otherwise noted, the start symbol is the nonterminal on the left side of the first production rule.
For multiple rules with the same left side, we can use the |
metasymbol to write the alternative right sides concisely. The four rules above can be written as follows:
<binary> ::= <digit> | <digit> <binary>
<digit> ::= '0' | '1'
We can also use the extended BNF metasymbols:
{
and }
to denote that the symbols between the braces are repeated zero or more times
[
and ]
to denote that the symbols between the brackets are optional (i.e., occur at most once)
Consider a derivation of the sentence 101
using the grammar for unsigned binary numbers above.
<binary>
<digit> <binary>
<digit> <digit> <binary>
<digit> 0 <binary>
1 0 <binary>
1 0 <digit>
1 0 1
This is not the only possible derivation for 101
. Let’s consider a second derivation of 101
.
<binary>
<digit> <binary>
1 <binary>
1 <digit> <binary>
1 0 <binary>
1 0 <digit>
1 0 1
The second derivation applies the same rules the same number of times, but it applies them in a different order. This case is called the leftmost derivation because it always replaces the leftmost nonterminal in the sentential form.
Both of the above derivations can be represented by the derivation tree (or parse tree) shown in Figure 1. (The numbers below the nodes show the rules applied.)
The grammar above for binary numbers is a special case of a context-free grammar called a right-linear grammar. In a right-linear grammar, all productions are of the forms
where and are nonterminals and is a sequence of zero or more terminals. Similarly, a left-linear grammar must have all productions of the form:
A grammar that is either right-linear or left-linear is called a regular grammar. (Note that all productions in a grammar must satisfy either the right- or left-linear definitions. They cannot be mixed.)
We can recognize sentences in a regular grammar with a simple “machine” (program) – a deterministic finite automaton (dfa).
In general, we must use a more complex “machine” – a pushdown automaton (PDA)– to recognize a context-free grammar.
We leave a more detailed study of regular and context-free grammars to courses on formal languages, automata, or compiler construction.
Now let’s consider the concrete syntaxes for the Expression language–first infix, then prefix.
An infix syntax for expressions is a syntax in which most binary operators appear between their operands as we tend to write them in mathematics and in programming languages such as Java and Haskell. For example, the following are intended to be valid infix expressions:
3
-3
x
1+1
x + 3
(x + y) * (2 + z)
For example, we can present the concrete syntax of our core Expression Language with the grammar below. Here we just consider expressions made up of decimal integer constants; variable names; binary operators for addition, subtraction, multiplication, and division; and parentheses to delimit nested expressions.
We express the upper levels of the infix expression’s syntax with the following context-free grammar where <expression>
is the start symbol.
<expression> ::= <term> { <addop> <term> }
<term> ::= <factor> { <mulop> <factor> }
<factor> ::= <var> | <val>
| '(' <expression> ')'
<val> ::= [ '-' ] <unsigned>
<var> ::= <id>
<addop> ::= '+' | '-'
<mulop> ::= '*' | '/'
Normally we want operators such as multiplication and division to bind more tightly than addition and subtraction. That is, we want expression x + y * z
to have the same meaning as x + (y * z)
. To accomplish this in the context-free grammar, we position <addop>
in a higher-level grammar rule than <mulop>
.
We can express the lower (lexical) level of the expression’s grammar with the following production rules:
<id> ::= <firstid> | <firstid> <idseq>
<idseq> ::= <restid> | <restid> <idseq>
<firstid> ::= <alpha> | '_'
<restid> ::= <alpha> | '_' | <digit>
<unsigned> ::= <digit> | <digit> <unsigned>
<digit> ::= any numeric character
<alpha> ::= any alphabetic character
The variables <digit>
and <alpha>
are essentially terminals. Thus the above is a regular grammar. (We can also add the rules for recognition of <addop>
and <mulop>
and rules for recognition of the terminals (
, )
, and -
to the regular grammar.)
We assume that identifiers and constants extend as far to the “right” as possible. That is, an <id>
begins with an alphabetic or underscore character and extends until it is terminated by some character other than an alphabetic, numeric, or underscore character (e.g., by whitespace or special character). Similarly for <unsigned>
.
Otherwise, the language grammar ignores whitespace characters (e.g., blanks, tabs, and newlines). The language also supports end of line comments, any characters on a line following a --
(double dash).
We can use a parsing program (i.e., a parser) to determine whether a concrete expression (e.g., 1 + 1
) satisfies the grammar and to build a corresponding parse tree.
Aside: In a previous subsection, we use the term derivation tree to refer to a tree that we construct from the root toward the leaves by applying production rules from the grammar. We usually call the same tree a parse tree if we construct it from the leaves (a sentence) toward the root.
Figure 2 shows the parse tree for infix expression 1 + 1
. It has <expression>
at its root. The children of a node in the parse tree depend upon the grammar rule application needed to generate the concrete expression. Thus the root <expression>
has either one child–a <term>
subtree–or three children– a <term>
subtree, an <addop>
subtree, and an <expression>
subtree.
If the parsing program returns a boolean result instead of building a parse tree, we sometimes call it a recognizer program.
An alternative is to use a parenthesized prefix syntax for the expressions. This is a syntax in which expressions involving operators are of the form
( op operands )
where op
denotes some “operator” and operands
denotes a sequence of zero or more expressions that are the arguments of the given operator. This is a syntax similar to the language Lisp.
In this syntax, the examples from the subsection on the infix syntax can be expressed something like:
3
3
x
(+ 1 1)
(+ x 3)
(* (+ x y) (+ 2 z))
We express the upper levels of a prefix expression’s syntax with the following context-free grammar, where <expression>
is the start symbol.
<expression> ::= <var> | <val> | <operexpr>
<var> ::= <id>
<val> ::= [ "-" ] <unsigned>
<operexpr> ::= '(' <operator> <operandseq> ')'
<operandseq> ::= { <expression> }
<operator> ::= '+' | '*' | '-' | '/' | ...
We can express the lower (lexical) level of the expression’s grammar with basically the same regular grammar as with the infix syntax. (We can also add the rule for recognition of <operator>
and for recognition of the terminals (
, )
, and -
to the regular grammar
The parse tree for prefix expression (+ 1 1)
is shown in Figure 3.
Because the prefix syntax expresses all operations in a fully parenthesized form, there is no need to consider the binding powers of operators. This makes parsing easier.
The prefix also makes extending the language to other operators–and keywords–much easier. Thus we will primarily use the prefix syntax in this and other cases studies.
We return to the problem of parsing expressions in a later subsection.
The abstract syntax of an expression seeks to represent only the essential aspects of the expression’s structure, ignoring nonessential, representation-dependent details of the concrete syntax.
For example, parentheses represent structural details in the concrete syntaxes given in the previous subsections. This structural information can be represented directly in the abstract syntax; there is no need for parentheses to appear in the abstract syntax.
We can represent arithmetic expressions conveniently using a tree data structure, where the nodes represent operations (e.g., addition) and leaves represent values (e.g., constants or variables). This representation is called a abstract syntax tree for the expression.
In Haskell, we can represent an abstract syntax trees using algebraic data types. Such types often enable us to express programs concisely by using pattern matching.
For the Expression Language, we define the Expr
data type (in the AbSynExpr
module) to describe the abstract syntax tree.
import Values ( ValType, Name )
data Expr = Add Expr Expr
| Sub Expr Expr
| Mul Expr Expr
| Div Expr Expr
| Var Name
| Val ValType
-- deriving Show?
instance Show Expr where
show (Val v) = show v
show (Var n) = n
show (Add l r) = showParExpr "+" [l,r]
show (Sub l r) = showParExpr "-" [l,r]
show (Mul l r) = showParExpr "+" [l,r]
show (Div l r) = showParExpr "/" [l,r]
showParExpr :: String -> [Expr] -> String
showParExpr op es =
"(" ++ op ++ " " ++ showExprList es ++ ")"
showExprList :: [Expr] -> String
showExprList es = Data.List.intercalate " " (map show es)
Above in type Expr
, the constructors Add
, Sub
, Mul
, and Div
represent the addition, subtraction, multiplication, and division, respectively, of the two operand subexpressions, Var
represents a variable with a name, and Val
represents a constant value.
Note that this abstract syntax is similar to the (Lisp-like) parenthesized prefix syntax described in the previous subsection.
We make type Expr
an instance of class Show
. We do not derive or define an instance of the Eq
class because direct structural equality of trees may not be how we want to define equality comparisons.
We can thus express the example expressions from the Concrete Syntax subsection as follows:
Val 3 -- 3
Val (-3) -- -3
Var "x" -- x
Add (Val 1) (Val 1) -- 1+1
Add (Var "x") (Val 3) -- x + 3
-- (x + y) * (2 - z)
Mul (Add (Var "x") (Var "y")) (Sub (Val 2) (Var "z"))
Figures 5 and 6 show abstract syntax trees for two example expressions above.
In the following chapter on parsing, we develop parsers for both the prefix and infix syntaxes. Both parsers construct abstract syntax trees using the algebraic data type Expr
.
The Expression Language restricts values to ValType
. The Values
module indirectly defines this type synonym to be Int
.
The abstract syntax allows a name to be represented by any string (i.e., type alias Name
, which is defined to be String
in the Values
module). We likely want to restrict names to follow the usual “identifier” syntax. The parser for the concrete syntax should enforce this restriction. Or we could define Haskell functions to parse and construct identifiers, such as the functions below.
import Data.Char ( isAlpha, isAlphaNum )
getId :: String -> (Name,String)
getId [] = ([],[])
getId xs@(x:_)
| isFirstId x = span isRestId xs
| otherwise = ([],xs)
where
isFirstId c = isAlpha c || c == '_'
isRestId c = isAlphaNum c || c == '_'
identifier :: String -> Maybe Name
identifier xs =
case getId xs of
(xs@(_:_),[]) -> Just xs
otherwise -> Nothing
The getId
function takes a string and parses an identifier at the beginning of the string. A valid identifier must begin with an alphabetic or underscore character and continue with zero or more alphabetic, numeric, or underscore characters.
The getId
function uses the higher order function span
to collect the characters that form the identifier. This function takes a predicate and returns a pair, of which the first component is the prefix string satisfying the predicate and the second is the remaining string.
In the following chapter, we examine how to parse an expression’s concrete syntax to build an abstract syntax tree.
TODO: Modify this subsection to account for the introduction of this concept in the Algebraic Data Type chapter.
In language processing, we often need to associate some key (e.g., a variable name) with its value. There are several names for this type of data structure – associative array, dictionary, map, symbol table, etc.
As we saw in a previous chapter, an association list is a simple list-based implementation of this concept. It is a list of pairs in which the first component is the key (e.g., a string) and the second component is the value associated with the key.
The Prelude function lookup
, shown below, searches an association list for a key and returns a Maybe
value. If it finds the key, it wraps the associated value in a Just
; if it does not find the key, it returns a Nothing
.
lookup :: (Eq a) => a -> [(a,b)] -> Maybe b
lookup _ [] = Nothing
lookup key ((x,y):xys)
| key == x = Just y
| otherwise = lookup key xys
For better performance with larger dictionaries, we can replace an association list by a more efficient data structure such as a Data.Map.Map
. This structure implements the dictionary structure as a size-balanced tree. It provides a lookup
function with essentially the same interface.
Of course, imperative languages might use a mutable hash table to implement a dictionary.
Consider the evaluation of the Expression Language’s abstract syntax trees as defined above.
To evaluate an expression, we must determine the current value of each variable occurring in the expression. That is, we must evaluate the expression in some environment that associates the variable names with their values.
For example, consider the expression x + 3
. It might be evaluated in an environment that associates the value 5
with the variable x
, written { x -> 5 }
. The evaluation of this expression yields the value 8
.
The environment { x -> 5 }
can be expressed in a number of ways in Haskell. Here we choose to represent it as a simple association list as follows:
[("x",5)]
This list associates a variable name in the first component with its integer value in the second component.
Looking up a key in an association list is an O(n)
operation where n
denotes the number of key-value pairs.
A good alternative to the association list is a Map
from the Data.Map
library. It implements the dictionary as a size-balanced tree, thus its lookup
function is an O(log n)
operation.
In the Expression Language implementation, we encapsulate the representation of the environment in the Environments
module. This module exports the following type synonym and functions:
type AnEnv a = [(Name,a)]
newEnv :: AnEnv a
toList :: AnEnv a -> [(Name,a)]
getBinding :: Name -> AnEnv a -> Maybe a
hasBinding :: Name -> AnEnv a -> Bool
newBinding :: Name -> a -> AnEnv a -> AnEnv a
setBinding :: Name -> a -> AnEnv a -> AnEnv a
bindList :: [(Name,a)] -> AnEnv a -> AnEnv a
For the purposes of our evaluation program, we can then define a specific environment with the type synonym Env
in the EvalExpr
module:
import Values ( ValType, Name, defaultVal )
import AbSynExpr ( Expr(..) )
import Environments ( AnEnv, Name, newEnv, toList, getBinding,
hasBinding, newBinding, setBinding, bindList )
type Env = AnEnv ValType
We express the semantics (i.e., meaning) of the various Expression Language expressions (i.e., nodes of the AST) as follows.
Val c
evaluates to the constant (NumType
) value c
.
Var n
evaluates to the value of variable n
in the environment, generating an error if the variable is not defined.
Add l r
evaluates to the sum of the values of the expression trees l
and r
.
Sub l r
evaluates to the difference between the values of the expression trees l
and r
.
Mul l r
evaluates to the product of the values of the expression trees l
and r
.
Div l r
evaluates to the quotient of the values of the expression trees l
and r
.
We can thus define a Haskell evaluation function (i.e., interpreter) for the Expression Language as follows.
This function in the EvalExpr
module does a post-order traversal of the abstract syntax tree, first computing the values of the child subexpressions and then computing the value of of a node. The value is returned wrapped in an Either
, where the Left
constructor represents and error message and the Right
constructor a good value.
import Values ( ValType, Name, defaultVal )
import AbSynExpr ( Expr(..) )
import Environments ( AnEnv, Name, newEnv, toList, getBinding,
hasBinding, newBinding, setBinding, bindList )
type EvalErr = String
type Env = AnEnv ValType
eval :: Expr -> Env -> Either EvalErr ValType
eval (Val v) _ = Right v
eval (Var n) env =
case getBinding n env of
Nothing -> Left ("Undefined variable " ++ n)
Just i -> Right i
eval (Add l r) env =
case (eval l env, eval r env) of
(Right lv, Right rv) -> Right (lv + rv)
(Left le, Left re ) -> Left (le ++ "\n" ++ re)
(x@(Left le), _ ) -> x
(_, y@(Left le)) -> y
eval (Sub l r) env =
case (eval l env, eval r env) of
(Right lv, Right rv) -> Right (lv - rv)
(Left le, Left re ) -> Left (le ++ "\n" ++ re)
(x@(Left le), _ ) -> x
(_, y@(Left le)) -> y
eval (Mul l r) env =
case (eval l env, eval r env) of
(Right lv, Right rv) -> Right (lv * rv)
(Left le, Left re ) -> Left (le ++ "\n" ++ re)
(x@(Left le), _ ) -> x
(_, y@(Left le)) -> y
eval (Div l r) env =
case (eval l env, eval r env) of
(Right _, Right 0 ) -> Left "Division by 0"
(Right lv, Right rv) -> Right (lv `div` rv)
(Left le, Left re ) -> Left (le ++ "\n" ++ re)
(x@(Left le), _ ) -> x
(_, y@(Left le)) -> y
Consider an example with a simple main
function below (that could be added to the EvalExpr
module) that evaluates the example expressions from a previous section. (See the extended EvalExpr
module.)
main =
do
let env = [("x",5), ("y",7),("z",1)]
let exp1 = Val 3 -- 3
let exp2 = Var "x" -- x
let exp3 = Add (Val 1) (Val 2) -- 1+2
let exp4 = Add (Var "x") (Val 3) -- x + 3
let exp5 = Mul (Add (Var "x") (Var "y"))
(Add (Val 2) (Var "z")) -- (x + y) * (2 + z)
putStrLn ("Expression: " ++ show exp1)
putStrLn ("Evaluation with x=5, y=7, z=1: "
++ show (eval exp1 env))
putStrLn ("Expression: " ++ show exp2)
putStrLn ("Evaluation with x=5, y=7, z=1: "
++ show (eval exp2 env))
putStrLn ("Expression: " ++ show exp3)
putStrLn ("Evaluation with x=5, y=7, z=1: "
++ show (eval exp3 env))
putStrLn ("Expression: " ++ show exp4)
putStrLn ("Evaluation with x=5, y=7, z=1: "
++ show (eval exp4 env))
putStrLn ("Expression: " ++ show exp5)
putStrLn ("Evaluation with x=5, y=7, z=1: "
++ show (eval exp5 env))
It first computes its value in the environment { x -> 5, y -> 7 }
and then computes its derivative relative to x
and then to y
.
Expression: 3
Evaluation with x=5, y=7, z=1: Right 3
Expression: x
Evaluation with x=5, y=7, z=1: Right 5
Expression: (+ 1 2)
Evaluation with x=5, y=7, z=1: Right 3
Expression: (+ x 3)
Evaluation with x=5, y=7, z=1: Right 8
Expression: (* (+ x y) (+ 2 z))
Evaluation with x=5, y=7, z=1: Right 36
TODO: Write this section
Read expression from command line
Evaluate expression after parsing
Print resulting value
Loop back for next expression
An expression may be more complex than necessary. We can simplify it, especially with the intention “optimizing” its evaluation.
An operation whose operands are constants can be simplified by replacing it by the appropriate constant. For example, Add (Val 3) (Val 4)
is the same as Val 7
.
Similarly, we can take advantages of an operation’s identity element and other mathematical properties to simplify expressions. For example, Add (Val 0) (Var "x")
is the same as Var "x"
.
We can thus define a skeletal function simplify
as follows. As with eval
, the simplify
function traverses the abstract syntax tree using a post-order traversal.
simplify :: Expr -> Expr
simplify (Add l r) =
case (simplify l, simplify r) of
(Val 0, rr) -> rr
(ll, Val 0) -> ll
(Val x, Val y) -> Val (x+y)
(ll, rr) -> Add ll rr
simplify (Mul l r) =
case (simplify l, simplify r) of
(Val 0, rr) -> Val 0
(ll, Val 0) -> Val 0
(Val 1, rr) -> rr
(ll, Val 1) -> ll
(Val x, Val y) -> Val (x*y)
(ll, rr) -> Mul ll rr
simplify t@(Var _) = t
simplify t@(Val _) = t
In an exercise, you are asked to complete the development of this function.
See the incomplete Process AST module ProcessAST
for the sample code in this subsection and the following one.
Suppose that we redefine the Expr
type to support double precision floating point (i.e., Double
) values.
Then let’s consider symbolic differentiation of the arithmetic expressions. Thinking back to our study of differential calculus, we identify the following rules for differentiation:
The derivative of a sum is the sum of the derivatives.
The derivative of a product of two operands is the sum of the product of (a) the first operand and the derivative of the second and (b) the second operand and the derivative of the first.
The derivative of some variable v
is 1 if differentiation is relative to v
and is 0 otherwise.
The derivative of a constant is 0.
We can directly translate these rules into a skeletal Haskell function that uses the above data types, as follows:
deriv :: Expr -> Name -> Expr
deriv (Add l r) v = Add (deriv l v) (deriv r v)
deriv (Mul l r) v = Add (Mul l (deriv r v)) (Mul r (deriv l v))
deriv (Var n) v
| v == n = Val 1
deriv _ _ = Val 0
An Expression Language interpreter consists of 7 modules with the module dependencies shown in Figure 6.
The Haskell source code for the Expression Language modules discussed in this chapter are linked below:
EvalExpr
AbSynExpr
Environments
Values
In addition, the code for the partially implemented Process AST module includes the skeleton simplify
and deriv
code:
ProcessAST
(simplify and deriv)This module is “wrapper” for the EvalExpr
module currently.
The source code for the remaining Expression Language modules are discussed in the next chapter. These are linked below:
Lexical_analyzer module LexExpr
common to both prefix and infix parsers
Prefix syntax
Recursive descent Parser module for prefix language ParsePrefixExpr
REPL module for prefix syntax PrefixExprREPL
Infix syntax
Recursive descent Parser module for infix language ParseInfixExpr
REPL module for infix syntax InfixExprREPL
The main entry points for use of the interpreters are the REPLs.
Extend the abstract syntax tree data type Expr
(in the Abstract Syntax module) to add new operations Neg
(negation), Min
(minimum), Max
(maximum), and Exp
(exponentiation).
data Expr = ...
| Neg Expr
| Min Expr Expr
| Max Expr Expr
| Exp Expr Expr
...
deriving Show
Then extend the eval
function (in the EvalExpr
module) to add these new operations with the following informal semantics:
Neg e
negates the value of e
– Neg (Val 1)
is (Val (-1))
Min l r
returns the smaller value of l
and r
Max l r
returns the larger value of l
and r
Exp l r
raises l
to the poewr r
Extend the simplify
function to support Sub
, Div
, and the new operations given in the previous exercise.
This function should simplify the abstract syntax tree by evaluating subexpressions involving only constants (not evaluating variables) and handling special values like identity and zero elements.
Extend the simplify
function from the previous exercise in other ways. For example, take advantage of mathematical properties such as associativity ((x + y) + z = x + (y + z)
), commutativity (x + 1 = 1 + x
), and idempotence (x min x = x
).
Extend the abstract syntax tree data type Expr
to include the binary operators Eq
(equality) and Lt
(less-than comparison), logical unary operator Not
, and the ternary conditional expression If
(if-then-else).
data Expr = ...
| Eq Expr Expr
| Lt Expr Expr
| Not Expr
| If Expr Expr Expr
...
deriving Show
Then extend the eval
function to implement these new operations.
We can express the informal semantics of the new Expression Language expressions as follows:
Eq l r
evaluates to the value 1 if l
and r
have the same value and to 0 otherwise.
Lt l r
evaluates to the value 1 if the value of l
is smaller than the value of r
and to 0 otherwise.
Not i
evaluates to 1 if i
is zero and evaluates to 0 if i
is nonzero.
If c l r
first evaluates c
; if c
is nonzero, the if
evaluates to the value of l
; otherwise the if
evaluates to the value of r
.
Note: The constants falseVal
and trueVal
and the functions boolToVal
and valToBool
in the Values
module may be helpful. (The intention of the Values
module is to keep the representation of the values hidden from the rest of the interpreter. In particular, these constants and functions these are to help encapsulate the representation of booleans as the underlying values.)
Develop an object-oriented program (e.g., in Java) to carry out the same functionality as the Expr
data type and eval
function described in this chapter. That is, define a class hierarchy that corresponds to the Expr
data type and use the message-passing style to implement the needed classes and instances.
Extend the object-oriented program from the previous exercise to implement the Neg
, Min
, Max
, and Exp
as described in an earlier exercise.
Extend the object-oriented program from the previous exercise to implement the Eq
, Lt
, Not
, and If
as described in another earlier exercise.
Extend the object-oriented program above to implement simplification.
For this exercise, redefine the Expr
data type above to hold Double
constants instead of Int
. In addition to Add
, Mul
, Sub
, Div
, Neg
, Min
, Max
, and Exp
, extend the data type and eval
function to include the trigonometric operators Sin
and Cos
for sine and cosine.
Using the extended Double
-version of Expr
from the previous exercise, extend function deriv
to support all the operators in the data type.
I initially developed this case study for the Haskell-based offering of CSci 556 (Multiparadigm Programming) in Spring 2017. I continued work on it during Summer and Fall 2017. I based this work, in part, on ideas from:
the Scala-based Expression Tree Calculator case study from my Notes on Scala for Java Programmers (which was itself adapted from the tutorial Scala for Java Programmers in Spring 2016)
the Lua-based Expression Language 1 and Imperative Core interpreters I developed for the Fall 2016 CSci 450 course
Samuel N. Kamin’s textbook Programming Languages: An Interpreter-based Approach, Addison Wesley, 1990 and my work to implement three (Core, Lisp, and Scheme) of these interpeters in Lua in 2013
Sections 1.2, 3.3, and 5.1 of Peter Linz’s textbook An Introduction to Formal Languages and Automata, Fifth Edition, Jones and Bartlett, 2017
Section 1.3 of Peter Sestoff’s Programming Language Concepts, Springer, 2012.
the Wikipedia articles on Regular Grammar, Context-Free Grammar, Backus-Naur Form, and Abstract Syntax
I maintain these notes as text in Pandoc’s dialect of Markdown using embedded LaTeX markup for the mathematical formulas and then translate the notes to HTML, PDF, and other forms as needed, The HTML version of this document may require use of a browser that supports the display of MathML.
TODO