Copyright (C) 2016, H. Conrad Cunningham
Acknowledgements: I adapted and revised much of this work from a previous HTML-source handout on domain specific languages.
Advisory: The HTML version of this document requires use of a browser that supports the display of MathML. A good choice as of June 2016 is a recent version of Firefox from Mozilla.
Few computer science graduates will work on the design and implementation of general-purpose programming languages during their careers. However, many graduates will need to design and implement--and all likely will need to use--special-purpose languages as a part of their day-to-day work. These special-purpose languages are often called domain-specific languages [Wikipedia].
Hudak defines a domain-specific language (or DSL) as "a programming language tailored to a particular application domain" [Hudak 1998], that is, to a particular kind of problem.
General-purpose languages (GPLs), such as Java, C, and Haskell, seek to be broadly applicable across many domains. They can, in theory, compute any function that is computable by a finite procedure; they are said to be Turing complete.
DSLs might be Turing complete, but often they are not. DSLs are little languages [Bentley 1986] that "trade generality for expressiveness" [Mernik 2005].
Ideally, a DSL should enable experts in an application area to program without programming--that is, to express the problems they want the computer to solve using familiar concepts and notations, without having to master the intricacies of programming in a general-purpose language [Hudak 1998, van Deursen 2000].
For example, the DSL pic (long available on Unix-based computers) enables writers to produce line drawings in documents; they can focus on the layout of the drawings without being required to develop programs in C (the primary general-purpose language used on Unix) [Bentley 1986].
Other DSLs on the Unix platform include lex (for generating lexical analyzers), yacc (for generating parsers), and make from building software from its various sources.
The markup language HTML is also a DSL for formatting documents on the Web.
The designers of a DSL must select relevant concepts, notations, and processes from the application domain and incorporate them into the DSL design [Hudak 1998].
Fowler classifies DSLs into two styles---external and internal [Fowler 2008, 2011]. Although the terminology is relatively new, the ideas are not.
An external DSL is a language that is different from the main programming language for an application, but that is interpreted by or translated into a program in the main language. The external DSL is a standalone language with its own syntax and semantics.
The Unix little languages pic, lex, yacc, and make exhibit this style. They are separate textual languages with their own syntax and semantics, but they are processed by C programs (and may also generate C programs).
External DSLs may use ad hoc techniques (e.g., hand-coded recursive descent parsers), parser-generation tools (e.g., lex and yacc in the C/Unix environment and Happy on the Haskell Platform), or parsing libraries (e.g., the Haskell library Parsec and the Lua library LPeg).
Example external DSLs include the Scala-based Secret Panel Controller DSLs and the Lua-based Lair External DSL in the instructor's notes.
An internal DSL transforms the main programming language itself into the DSL--the DSL is embedded in the main language [Fowler 1998].
The techniques for constructing internal DSLs vary from language to language.
The language Lisp (which was defined in the 1960s) supports syntactic macros, a convenient mechanism for extending the language by adding application-specific features that are expanded at compile time.
Internal DSLs in the language Ruby exploit the language's flexible syntax and runtime reflexive metaprogramming facilities. The Ruby on Rails web framework includes several such internal DSLs.
Haskell's algebraic type system has stimulated research on "embedded" DSLs for several domains including reactive animation and music [Hudak 1998, 2000].
In object-oriented languages, internal DSLs may also exploit object structures and subtyping. Example internal DSLs include the Scala-based [Computer Configuration]450lectureNotes.html#compConfigDSL) and Email Message Building DSLs and the Lua-based Lair DSLs in the instructor's notes.
In a shallow embedding of an internal DSL, the implementation's types and data structures directly represent the semantics of the domain but do not represent the syntactic structure of the domain objects.
For example, the regular expression package from the Thompson Haskell textbook [Thompson 2011], section 12.3 (CSci 450 Assignment #3 in Fall 2014), is a shallow embedding of the regular expression concept. It models the semantics but not the syntax of regular expressions. It uses functions to represent the regular expressions and higher order functions (combinators) to combine the regular expressions in valid ways.
Similarly, the Scala-based Computer Configuration and Email Message Building DSLs and the Lua-based Lair DSLs are relatively shallow embeddings of the DSLs.
In a deep embedding of an internal DSL, the implementation's types and data structures model both the syntax and semantics of the domain. That is, it represents the domain objects using abstract syntax trees.
For example, section 19.4 of the Thompson Haskell textbook [Thompson 2011] redesigns the regular expression package as a deep embedding. It introduces types that represent the syntactic structure of the regular expressions as well as their semantics.
What about the arithmetic expression language given in William Cook's Anatomy of Programming Languages? [The Fall 2015 CSci 450 class used this set of notes for some of its study.]
The concrete syntax and semantics of the expression language is different from Haskell, so at that level it is an external DSL. The parser recognizes a valid arithmetic expression in the input text and creates an appropriate abstract syntax tree inside the Haskell program. The abstract syntax tree differs from the textual arithmetic expression and from its parse tree.
However, the abstract syntax tree itself can be considered a deep embedding of an internal DSL for the abstract syntax. For the remainder of the processing of the abstract expression, the abstract syntax tree preserves the important syntactic (structural) features of the arithmetic expressions.
The advantage of a shallow embedding is that it provides a simple implementation of the semantics of the domain. It is usually straightforward to modify the semantics by adding new operations. If these capabilities are all that one needs, then a shallow embedding is convenient.
The advantage of a deep embedding is that, in addition to manipulating the semantics of the domain, one can also manipulate the syntactic representation of the domain objects. The syntactic representation can be analyzed, transformed, and translated in a many ways that are not possible with a shallow embedding.
For example, the deep embedding of the regular expression DSL can enable replacement of one regular expression by an equivalent simpler one, such as replacing (a*)* by a*.
Of course, the disadvantages of deep embedding is that they are more complex to develop, understand, and modify.
The accompanying Sandwich DSL case study [Haskell] [Scala] illustrates how one can create a DSL in a simple situation.