CSci 450-01: Organization of Programming Languages
CSci 503-01: Fundamental Concepts in Languages
Fall 2014


Domain Specific Languages (DSLs)

What are DSLs?

Few computer science graduates will work on the design and implementation of general-purpose programming languages during their careers. However, many graduates will need to design and implement--and all likely will need to use--special-purpose languages as a part of their day-to-day work. These special-purpose languages are often called domain-specific languages [Wikipedia].

Hudak defines a domain-specific language (or DSL) as "a programming language tailored to a particular application domain" [Hudak 1998], that is, to a particular kind of problem.

General-purpose languages (GPLs), such as Java, C, and Haskell, seek to be broadly applicable across many domains. They can, in theory, compute any function that is computable by a finite procedure; they are said to be Turing complete.

DSLs might be Turing complete, but often they are not. DSLs are little languages [Bentley 1986] that "trade generality for expressiveness" [Mernik 2005].

Ideally, a DSL should enable experts in an application area to program without programming--that is, to express the problems they want the computer to solve using familiar concepts and notations, without having to master the intricacies of programming in a general-purpose language [Hudak 1998, van Deursen 2000].

For example, the DSL pic (long available on Unix-based computers) enables writers to produce line drawings in documents; they can focus on the layout of the drawings without being required to develop programs in C (the primary general-purpose language used on Unix) [Bentley 1986].

Other DSLs on the Unix platform include lex (for generating lexical analyzers), yacc (for generating parsers), and make from building software from its various sources.

The markup language HTML is also a DSL for formatting documents on the Web.

The designers of a DSL must select relevant concepts, notations, and processes from the application domain and incorporate them into the DSL design [Hudak 1998].

External and Internal DSLs

Fowler classifies DSLs into two styles---external and internal [Fowler 2008, 2011]. Although the terminology is relatively new, the ideas are not.

An external DSL is a language that is different from the main programming language for an application, but that is interpreted by or translated into a program in the main language. The external DSL is a standalone language with its own syntax and semantics.

The Unix little languages pic, lex, yacc, and make exhibit this style. They are separate textual languages with their own syntax and semantics, but they are processed by C programs (and may also generate C programs).

External DSLs may use ad hoc techniques (e.g., hand-coded recursive descent parsers), parser-generation tools (e.g., lex and yacc in the C/Unix environment and Happy on the Haskell Platform), or parsing libraries (e.g., the Haskell library Parsec and the Lua library LPeg).

An internal DSL transforms the main programming language itself into the DSL--the DSL is embedded in the main language [Fowler 1998].

The techniques for constructing internal DSLs vary from language to language.

The language Lisp (which was defined in the 1960s) supports syntactic macros, a convenient mechanism for extending the language by adding application-specific features that are expanded at compile time.

Internal DSLs in the language Ruby exploit the language's flexible syntax and runtime reflexive metaprogramming facilities. The Ruby on Rails web framework includes several such internal DSLs.

Haskell's algebraic type system has stimulated research on "embedded" DSLs for several domains including reactive animation and music [Hudak 1998, 2000].

Shallow and Deep Embeddings of Internal DSLs

In a shallow embedding of an internal DSL, the implementation's types and data structures directly represent the semantics of the domain but do not represent the syntactic structure of the domain objects.

For example, the regular expression package from the Thompson textbook section 12.3 (Assignment #3 in 2014) is a shallow embedding of the regular expression concept. It models the semantics but not the syntax of regular expressions. It uses functions to represent the regular expressions and higher order functions (combinators) to combine the regular expressions in valid ways.

In a deep embedding of an internal DSL, the implementation's types and data structures model both the syntax and semantics of the domain. That is, it represents the domain objects using abstract syntax trees.

For example, section 19.4 of the Thompson textbook redesigns the regular expression package as a deep embedding. It introduces types that represent the syntactic structure of the regular expressions as well as their semantics.

What about the arithmetic expression language given in William Cook's Anatomy of Programming Languages?

The concrete syntax and semantics of the expression language is different from Haskell, so at that level it is an external DSL. The parser recognizes a valid arithmetic expression in the input text and creates an appropriate abstract syntax tree inside the Haskell program. The abstract syntax tree differs from the textual arithmetic expression and from its parse tree.

However, the abstract syntax tree itself can be considered a deep embedding of an internal DSL for the abstract syntax. For the remainder of the processing of the abstract expression, the abstract syntax tree preserves the important syntactic (structural) features of the arithmetic expressions.

The advantage of a shallow embedding is that it provides a simple implementation of the semantics of the domain. It is usually straightforward to modify the semantics by adding new operations. If these capabilities are all that one needs, then a shallow embedding is convenient.

The advantage of a deep embedding is that, in addition to manipulating the semantics of the domain, one can also manipulate the syntactic representation of the domain objects. The syntactic representation can be analyzed, transformed, and translated in a many ways that are not possible with a shallow embedding.

For example, the deep embedding of the regular expression DSL can enable replacement of one regular expression by an equivalent simpler one, such as replacing (a*)* by a*.

Of course, the disadvantages of deep embedding is that they are more complex to develop, understand, and modify.

The accompanying Sandwich DSL case study illustrates how one can create a DSL in a simple situation.

References

[Bentley 1986]
J. Bentley. Programming pearls: Little languages. Communications of the ACM, 29(8):711-721, August 1986.
[Fowler 2008]
M. Fowler. DomainSpecificLanguage, Blog posting, 15 May 2008.
[Fowler 2011]
M. Fowler. Domain Specific Languages, Addison Wesley, 2011.
[Hudak 1998]
P. Hudak. Modular domain specific languages and tools. In P. Devanbu and J. Poulin, editors, Proceeding of the 5th International Conference on Software Reuse (ICSR'98), pages 134-142. IEEE, 1998.
[Mernik 2005]
M. Mernik, J. Heering, and A. M. Sloane. When and how to develop domain specific languages. ACM Computing Surveys, 37(4):316-344, December 2005
[van Deursen 2000]
A. van Deursen, P. Klint, and J. Visser. Domain specific languages: An annotated bibliography. SIGPLAN Notices, 35(6):26-36, June 2000.
[Wikipedia]
Wikipedia. Domain-specific language, http://www.martinfowler.com/bliki/DomainSpecificLanguage.html, accessed 26 October 2014.

UP to CSci 450/503 Lecture Notes root document?


Copyright © 2014, H. Conrad Cunningham
Last modified: Mon Jan 12 12:42:20 CST 2015