# Domain-Specific Languages: From Design to Implementation Application to Video Device Drivers Generation

Scott A. Thibault, Renaud Marlet, and Charles Consel

**Abstract**—Domain-Specific languages (DSL) have many potential advantages in terms of software engineering ranging from increased productivity to the application of formal methods. Although they have been used in practice for decades, there has been little study of methodology or implementation tools for the DSL approach. In this paper, we present our DSL approach and its application to a realistic domain: the generation of video display device drivers. The presentation focuses on the validation of our proposed framework for domain-specific languages, from design to implementation. The framework leads to a flexible design and structure, and provides automatic generation of efficient implementations of DSL programs. Additionally, we describe an example of a complete DSL for video display adaptors and the benefits of the DSL approach for this application. This demonstrates some of the generally claimed benefits of using DSLs: increased productivity, higher-level abstraction, and easier verification. This DSL has been fully implemented with our approach and is available. Compose project URL: http://www.irisa.fr/compose/gal.

Index Terms—GAL, video cards, device drivers, domain-specific language, partial evaluation.

## **1** INTRODUCTION

IN contrast to a general purpose language (GPL), a *domain-specific language* (DSL) is a language that is expressive uniquely over the specific features of programs in a given problem domain. It is often small and more declarative than imperative; it may be textual or graphic. DSLs have also been called *application domain languages* [7], *little or micro-languages* [2], and are related to scripting languages. DSLs have been used in various domains such as graphics [14], [19], financial products [1], telephone switching systems [15], [21], protocols [8], [31], operating systems [28], and robot languages [5]. Languages such as SQL, T<sub>E</sub>X, and Unix shell languages may also be considered DSLs.

Software architectures based on DSLs are primarily aimed at achieving faster development of safer applications. Because constructs in a DSL abstract key concepts of the domain, the developer (that does not have to be a skilled programmer) can write more concise and higher level programs in less time. Programming with a DSL also contributes to safety because it is less error-prone than with a GPL. Additionally, high-level constructs translate, in practice, into the reuse of validated components. Moreover, when the language is small and specific, it is possible or easier to apply automated proof techniques that have been developed for general purpose languages, but have had limited success due to the generality of GPLs. For example, termination properties may be considered if the language is not Turing-complete. Similarly, it is easier to build test generation tools.

Manuscript received 19 June 1998; revised 9 Feb. 1999.

Recommended for acceptance by D.S. Wile.

For information on obtaining reprints of this article, please send e-mail to: tse@computer.org, and reference IEEECS Log Number 109170.

A DSL may also be seen as a way to parameterize a generic application or to designate a member of a program family. A *program family* is a set of programs that share enough characteristics that it is worthwhile to study them as a whole. In fact, designing a DSL actually involves the same *commonality analysis* [15] that is used in the study of a program family, i.e., determining assumptions that are true for all members of the family and variations among members. This process should be performed by both domain experts and software engineers.

Though actual uses of DSLs record benefits such as productivity, reliability, and flexibility [20], implementing DSLs is often difficult and costly [9]. There are two kinds of approaches to language implementation, each with significant disadvantages. Approaches that are based on compilers, such as application generators (translation from the DSL to a GPL), are not easy to write or to extend, and extensions require skills in compiler technology that cannot be expected from "domain developers." On the other hand, approaches that are based on interpreters are easier to write or to extend, but are less efficient [4]. This implementation issue also impacts maintainability because complexity in a DSL compiler defeats the software engineering goals of using a DSL [33]. Depending on one's objectives, either style of implementation is thus chosen: application generator or interpreter.

We have proposed a framework for the development of application generators that reconciles both alternatives, offering the flexibility of interpreters and the performance of compilers [30]. The framework relies on *partial evaluation* [16], [18], a program transformation technique that is well suited to automatically transform interpreters into compilers [17]. Partial evaluation exploits known information about a program's input to be able to evaluate parts of a program in advance. Given a program and the known

S.A. Thibault, R. Marlet, and C. Consel are with IRISA/INRIA, Université de Rennes 1, Campus Universitaire de Beaulieu, 35042 Rennes cedex, France. E-mail: thibault@gmvhdl.com; {marlet, consel}@irisa.fr. http://www.irisa.fr/compose

portion of its input, a partial evaluator produces a specialized program. In this new, semantically equivalent program, computations depending on known values have already been performed. Given an interpreter for a DSL (that takes as arguments a DSL program and the input data of the DSL program) and a DSL program, partial evaluation automatically yields a specialized interpreter, i.e, a compiled version of the DSL program.

Our framework is structured into two parts that come after the family analysis. The first part consists of the definition of an abstract machine, whose operations can be viewed as generic components that capture important operations of the domain. The second part is the definition of a DSL in terms of the abstract machine operations, thus providing a high level interface to the abstract machine. The use of partial evaluation in our framework is twofold, corresponding to each part: it maps a DSL program into an abstract machine program, removing the interpretation layer, and an abstract machine program into an efficient implementation. The development of this framework is supported by industry partners for realistic applications.

This paper describes a realistic application of our framework for the automatic generation of video card drivers. This domain naturally forms a program family, for which DSLs are well suited. We present the design and definition of a complete DSL for video display adaptors. Concerning performance, we show how partial evaluation can yield efficient drivers. Concerning safety, we insure that all generated drivers can be proven to terminate and define some analyses that can greatly improve their reliability. The DSL has been fully implemented with our approach and is available at URL http://www.irisa.fr/compose/gal.

Our contributions can be summarized as follows:

- We validate our framework of application generator design on a realistic example: video card device drivers.
- We define a DSL for generating such drivers. This restricted language allows program verifications.
- We provide a flexible implementation of this language that generates efficient video drivers.
- We illustrate the benefits of DSLs as a software architecture.

The rest of the paper is organized as follows. Section 2 describes our framework for application generator design in further detail. Section 3 presents the domain of video card drivers. Section 4 describes the two-level design: abstract machine and graphics adaptor language. Section 5 discusses the results of applying this approach to the domain of video drivers. Section 6 summarizes the results of this experiment and identifies future work, both for the language and the framework.

## 2 A FRAMEWORK FOR DESIGNING AND IMPLEMENTING DSLS

In a previous paper, we presented an approach to application generator design [30]. In this approach, we consider the implementation of a program family as a single generic program. The parameterization of this generic program corresponds to variations within the program family and can be represented using a microlanguage, i.e., a DSL. In other words, the generic program interprets DSL programs to know what actions pertaining to the application family to perform. The possible actions define an abstract machine that is adapted to the domain and the application family, whereas an interpretation layer, mapping constructs to actions, provides an interface between DSL programs and the abstract machine. The performance overhead due to genericity, in the interpretation layer as well as in the adaptation of the abstract machine, calls for optimization via partial evaluation.

This approach is the basis of a general framework for designing and implementing DSLs. This framework is sketched in Fig. 1; it is described further in the following subsections. More details concerning the impact on reuse (for code as well as expertise) and advantages over other application generators designs are given in [30].

## 2.1 Analysis

The first phase of the framework is a *family analysis* phase. This phase studies features that are present in all members of the family and variations among members. It can be conducted using a methodology such as FAST's *commonality analysis* [15]. The family analysis may also rely on a *domain analysis* [23], [25], [27], which discovers the commonalities in a domain. This analysis phase has two sets of outputs, which lead to the design of both a DSL and an abstract machine.

## 2.2 Abstract Machine Design

The analysis phase identifies key objects of the domain and program family, as well as basic operations on those objects. These operations are used to define an *abstract machine*, which offers a model of computation that underlies all programs in the family [9]. Other operations are also included so that it is possible to construct any program in the family from those operations.

The use of abstract machines is a natural progression from established reuse practices. Starting from the idea of highly parameterized subroutines in a reuse library [6], one might consider these to be generic components or operations that provide a level of abstraction. This level of abstraction provides insulation between the definition of the operation and an implementation. Given the context of a domain-specific solution, it then seems reasonable that for a given domain, we can define a collection of related operations that cooperate to solve the relevant problems in the domain. Finally, by enforcing an explicit state, as opposed to threading arguments across abstract machine instructions, we obtain an abstract machine model that can be implemented efficiently.

There are many advantages to this approach. One of these advantages is the opportunity to have several implementations of a single abstract machine. The same abstract machine could also be implemented in different languages. Another benefit of the approach is that it provides a formal model of computation that can be reasoned about using well-established techniques for abstract machines [26]. Being able to reason about operations in this way enables the verification of certain properties about DSL programs, or derive other properties like



Fig. 1. DSL design and implementation framework.

time complexities. The abstract machine model also provides the right level of decomposition to increase reuse of the abstract machine [35].

#### 2.3 Language Design

The analysis phase has three other outputs: 1) terminology, 2) commonalities, and 3) variations among the family members. This information, with the addition of constraints such as the level of the language or its analyzability, is used to design the DSL.

The idea is that the DSL is implemented in terms of the abstract machine. The key difference between the DSL and the abstract machine is that a DSL program describes *what* an application does and an abstract machine program describes how the application operates. The link between the DSL and the abstract machine is that the DSL can be viewed as a glue language for composing abstract machine operations, i.e., an interface to the abstract machine. This interface first provides a superior abstraction to the DSL program designer, and second, further restricts the applications that can be expressed, thus forming the program family. The DSL is generally designed to express programs in terms of domain-specific concepts. For usability and analyzability, it should have a semantics as restricted as possible, depending on the future requirements of the program family.

#### 2.4 Structuring the Implementation

A DSL can be implemented as either an interpreter or a compiler (to a target machine or a GPL). The most

straightforward approach to implementing a DSL is to build an interpreter. While an interpreter directly interprets each language construct to produce the results, a compiler produces a program, which when executed produces the results. Thus, the compiler approach introduces an indirection which makes it more difficult to construct. Moreover, an interpreter facilitates prototyping. For these reasons, we propose an approach based on interpreters.

In our framework, the implementation of the DSL is thus expressed as an interpreter, which calls the abstract machine operations. The abstract machine is typically implementated as a highly parameterized library.

Just as there could be many implementations for an abstract machine, this staged framework also provides the possibility to have many DSL languages for one abstract machine. Since the abstract machine can express a wide range of applications within the domain, and the DSL only a restricted subset of these, it is useful to have multiple DSLs for different users. For example, a DSL could manage a whole database while a subset of this DSL might only be able to express queries.

Although interpreters are easier to construct they are also less efficient. Similarly, the genericity of a parameterized library introduces execution-time overhead. In the next subsection, we present an approach to obtaining efficient implementations based on partial evaluation.

## 2.5 Efficiency via Partial Evaluation

There are two identified sources of inefficiency in the framework presented so far: the DSL interpretation layer and the parameterization of the library implementing the abstract machine. In particular, interpretation has been cited to be one to two orders of magnitude slower than compiled code [29]. There exist a technique to automatically remove these two kinds of overhead: partial evaluation. As a matter of fact, partial evaluation has proved to be very effective in mapping software architectures to efficient implementations [22].

Partial evaluation. Partial evaluation is a fully automatic program transformation which specializes a program to a particular context reducing its execution time and, in some cases, its size [10], [18]. A specialization context is defined by assigning values to some subset of a program's inputs. More specifically, consider a program p, taking some argument data d and producing a result r, which may be written as p(d) = r. If d can be split into  $d = (d_1, d_2)$  where  $d_1$  is a known (i.e., it does not vary) subset of the input, which describes the context, and  $d_2$  is yet *unknown*, we may form a new program  $(p, d_1)$  that waits until  $d_2$  is available and then calls the original pprogram on  $(d_1, d_2)$  to produce the same result r. In other words,  $(p, d_1)(d_2) = p(d_1, d_2) = r$ . However, since  $d_1$  is known, computations relying on  $d_1$  can be performed before  $d_2$  is actually available. Therefore, we can form a new program  $p_{d_1}$ , equivalent to  $(p, d_1)$ , where computations depending on  $d_1$  have been eliminated. We thus have  $p_{d_1}(d_2) = p(d_1, d_2) = r$ . The program  $p_{d_1}$  is called a *specialization* of p with respect to the known input  $d_1$ . The known inputs representing the context are also called static whereas the other unknown inputs are called dynamic. A partial evaluator is a program PE which computes  $p_{d_1}$ :  $PE(p, d_1) = p_{d_1}$ .

For the case study described in this article, we have used a partial evaluator named Tempo Specializer [10], [11]. Tempo is a fully automatic partial evaluator for C programs. Tempo can specialize programs at compile time (i.e., source-to-source transformation) as well as at run time. Users of Tempo specify inputs to the program entry point and global variables as either static (i.e., already known) or dynamic (i.e., yet unknown).

An example of partial evaluation. Fig. 2 shows an example of specializing a simple version of printf. The top part of the figure is the original code while the bottom part is the result of specializing the function with the input fmt equal to "n: %d". In fact, this example represents a very simple interpreter in the two-level framework in Fig. 1. The fmt parameter is a program which specifies how the data in the val parameter should be displayed. The putint, putchar, and abort functions are the abstract machine instructions that are used to print simple values. The mini\_printf function represents the interpretation layer which decides when and how to invoke the abstract machine instructions to implement the behavior specified in the fmt argument.

**Partial evaluation of interpreters.** Given an interpreter for a DSL (that takes as arguments a DSL program and the input data of the DSL program) and a known DSL program, partial evaluation can be applied to automatically produce an implementation that is specialized with respect to the DSL program, i.e., a compiled DSL program. Thus, the

```
mini printf(char fmt[], int val[])
{
 int i = 0;
 while (*fmt != \prime \setminus 0\prime) {
   if ( * fmt != '%' )
       putchar(*fmt);
   else
      switch(*++fmt) {
        case 'd': putint(val[i++]);
                    break;
        case '%': putchar('%');
                    break;
        default : <u>abort();</u> /* error */
      }
   fmt++;
 }
}
mini printf fmt(int val[])
{
  putchar('n');
  putchar(':');
  putchar('');
  putint(val[0]);
}
```

Fig. 2. Specialization with respect to fmt = "n: %d".

resulting functionality is equivalent to that of a compiler at the cost of writing an interpreter. The use of partial evaluation, that makes up a standalone application, given the generic program and a DSL program, can be considered an application generator.

**Ensuring efficiency.** If the mapping performed by the interpretation layer depends only on the input program and the input program is a known input, a partial evaluator should be able to eliminate the entire interpretation layer. Thus, as shown in Fig. 1, when the interpretation layer is specialized with respect to the input program, only invocations of the abstract machine instructions should remain: the result is an abstract machine program. In order to eliminate the interpretation layer, we rely on a program analysis performed during partial evaluation: binding time analysis.

The actual partial evaluation process is split into two phases: a *binding-time analysis* and the actual *specialization* transformation. During the binding-time analysis, dependencies are propagated to determine for each subexpression of a program if it depends only on known values and can, thus, be evaluated. As a result, each subexpression is given a *binding time* of static to mean it depends only on known inputs or dynamic otherwise. The second phase performs the specialization by evaluating the static expressions and outputting the specialized program.

The code in the top of Fig. 2 depicts a bind-time annotated function. The underlined expressions have dynamic binding times and the rest have static binding times. As expected, the only dynamic expressions in mini\_printf are calls to the abstract machine instructions. Thus, all the interpretation is evaluated at specialization time and all that remains are these calls, as shown in the bottom of Fig. 2.

The following rules define the requirements on the structure which guarantee the elimination of the interpretation layer.

- 1. References to the abstract machine state in the interpretation layer may only appear as subprogram arguments.
- 2. The abstract machine implementation may not contain any references to the interpreter state.

In our application of Tempo, we ensure the successful application of partial evaluation via the separation of the abstract machine and the interpreter, each having its on state represented in C by global variables. The interpreter state is specified as static and the abstract machine state is specified as dynamic. The visualization of the binding times produced by Tempo analyses lets the user assess the correct separation between the interpreter and the abstract machine, and thus the successful partial evaluation.

In Fig. 1, a second step of partial evaluation is shown in which the implementation of the abstract machine is specialized with respect to the abstract machine program. The reason for this second step is that the abstract machine operations are often highly parameterized reusable components. It may also be desirable to eliminate the genericity introduced by this parameterization. If these parameters depend only on the input program, then the abstract machine program will contain instructions with constant values for these arguments. The second step of partial evaluation will exploit these values to remove the genericity from the instructions' implementations. Specialization can also optimize inefficiencies introduced by the boundaries between operations. The reason that partial evaluation is done twice is to obtain the abstract machine program for analysis or any other reason. If the abstract machine program is not needed, a single application of partial evaluation to the whole interpreter will yield the same results.

There is an important difference between the genericity removed from the interpretation layer and that of the abstract machine layer. If the interpreter structure of our framework is respected then the interpretation layer is guaranteed to be removed. However, there is no guarantee on how much of the abstract machine layer will be removed.

## **3 VIDEO DRIVER DOMAIN**

This section introduces the domain of the experiment: video adaptor device drivers. A *video adaptor* (or *video card*) is a hardware component of a computer system which stores and produces images on the display. Video cards consist of a frame buffer, and a graphics controller. The *frame buffer* is a bank of high speed memory used to store the display data, including the currently visible image. The *graphics controller* consists of two main functionalities: producing the video signal for the display, and providing access to the frame buffer to create the display image. Graphics controllers all

provide similar sets of functionalities (e.g., changing the display resolution).

Although all adaptors provide similar functionalities, their programming interface is different from vendor to vendor, and often between successive models of the same adaptor. This is true of most devices, and is resolved by the use of device drivers. Device drivers generally consist of a library of functions that implement a standard API that is fixed for all devices. Thus, the driver's purpose is to translate the standard API operations into the operations required by a specific device, providing a uniform interface to the operating system and applications.

Video device drivers provide two main services to the operating system and applications. The first is to put the graphics display into different video modes. A *video mode* (or *graphics mode*) is defined by the horizontal and vertical resolution, the number of colors per pixel and screen refresh rates. The second service provided by the driver is to provide access to hardware drawing operations. For example, most video cards provide line drawing hardware, which draws lines on the display at a much faster rate than would be possible in software.

#### 4 APPLICATION OF THE APPROACH

We have applied the approach described in Section 2 to a family of device drivers for video adaptors. We considered an already existing set of device drivers from the XFree86 X Window server created by The XFree86 Project, Inc. [36]. The XFree86 SVGA server is a generic X Window server, written in C, supporting several different cards using a device driver architecture. This server contains drivers for cards from about 25 different vendors. Additionally, each driver supports as many as 24 different models from the same company. This structure alone indicates that there is enough similarities between models of the same vendor to implement them as a generic program, but that it is not reasonable to do so for multiple vendors. This may be due to efficiency, but more likely is due to the lack of a methodology to handle larger scales of variation.

The remainder of this section details the application of our approach to the construction of an application generator of video drivers (for different vendors) for the X Window server. We first discuss the definition of an abstract machine for the domain, identified by studying the existing device drivers. Then we describe a DSL for generating video drivers and related design issues.

#### 4.1 The Abstract Machine

The abstract machine for the video driver domain was designed primarily by studying the implementation of existing drivers. The abstract machine was also iteratively refined during the development of a DSL. We identified three patterns which appeared in the existing drivers that could be used to guide the definition of abstract machine operations.

**Operation pattern.** The first of these patterns corresponds to simple atomic operations in the abstract machine. There are two forms in which this pattern appears: as repeated fragments of code that differ only by data, and as fragments which perform the same treatment but have a

small number of variations on how it is performed. In the first case, the fragments are often already identified and placed in a library or defined as a macro. These fragments directly correspond to abstract machine operations.

As an example of the second case, the device drivers are dominated by occurrences of code fragments which read or write data from or to the video card. Communication with hardware devices can be handled in a small number of different ways, and the scheme chosen varies from vendor to vendor. There were several occurrences of three of these different schemes of I/O, differing only in certain data (e.g., the I/O address). These fragments were captured in a single abstract machine operation defined as follows:

This instruction is parameterized by flags to specify which scheme to use (indexed, paired, or PCI), and the data used by the scheme to perform the I/O (port\_number, index).

**Combination of operations pattern.** The second type of pattern recognized can be identified as expressions or combinations of operations. This pattern is characterized by expressions or combinations of operations that have no commonalities between members of the family. For example, in the device drivers there are sequences of shifts and logical expressions which are different for every driver. Although there are no commonalities in those expressions from one driver to the next, we can identify a sufficient set of operations to construct any instance. The selection of these operations depends not only on the existing samples, but on an understanding of the domain, and speculation on the future of the domain.

The following code fragment shows an example of this pattern from one of the existing drivers.

This portion of the driver maps the value of no onto the appropriate registers in order to select the clock. For a given driver, there may be any number of reads, writes, shifts and logic operations, but no other operations. Thus, we can implement any given driver with a sequential composition of a small number of abstract machine operations.

**Control pattern.** The last pattern consists of code fragments that share a common control structure, but contain code fragments matching the combination of operations pattern previously discussed. For example, consider a function of the device driver which is used to save, restore, and set the clock value on the video card.<sup>1</sup> This function has the following form:

1. Video cards have programmable clocks which can be set to different frequencies to control the video refresh rate.

```
Bool ClockSelect (int no)
{
    switch (no) {
        /* Save the clock value. */
        case CLK_REG_SAVE:
            Series of I/Os and logic operations.
            break;
        /* Restore saved clock value. */
        case CLK_REG_RESTORE:
        A second series of I/Os and logic operations.
            break;
        /* Set the clock value to no. */
        default:
        A third series of I/Os and logic operations.
    }
}
```

The series of I/Os and logic operations in this example follow the combination of operations pattern, and can be constructed by sequences of abstract machine operations.

For this pattern, we introduce higher-order abstract machine operations. That is, abstract machine operations which take sequences of abstract machine operations as parameters. These parameters correspond to the contained fragments that follow the combination of operations pattern. The example above is captured by the following abstract machine operation:

**Conclusion.** Using these patterns with existing examples, we were able to define an abstract machine that could express the behavior of any particular device driver. Although they were typically easy to recognize, it is important to realize that it was necessary to abstract from certain details in order to see the different patterns; e.g., in our experiment, the examples were mostly written by different people, who had different styles of programming, and sometimes took different approaches to the same problem. In this situation, it was necessary to determine if the same functionality could be implemented with a common structure, which happened to always be the case.

## 4.2 The GAL Language

In this section, we present the Graphics Adaptor Language (GAL) for video device driver specification. In order to understand where the language comes from, it is important to know what the essential variations are among video adaptors. The remainder of the section describes the variations that exist between cards and the corresponding constructs in GAL that capture them. A complete example of a GAL specification is described in Appendix A.

## 4.2.1 Ports, Registers, Fields, and Params

A video adaptor is controlled by setting certain parameters stored in hardware registers of the card. These registers have addresses. A single parameter may be stored in multiple registers and only certain bits of the registers may be used. Thus the layout of the parameters on the register space is the first major variation between cards.

Access to the registers are provided through various communication schemes. As mentioned in the previous section, there is a small number of different schemes that can be used to communicate with a hardware device from a program. The choice of communication scheme is the second major variation between cards. We define several concepts to describe these notions of communication and register layout.

**Ports.** The first concept is the *port* which is used to define a point of communication. For example, the declaration

port svga indexed := 0x3d4;

defines a port named svga, which uses an indexed communication scheme at the I/O address 0x3d4. This is a standard port used by many video cards.

**Registers.** A second concept is provided by the *register* declaration, which defines how to access registers on the card using the defined ports. For example, the declaration

register ChipID := svga(0x30);

defines a register ChipID, which is accessed through port svga, at index 0x30.

**Fields.** The next concept is specified with a field declaration. The *field* declaration defines where a logical value is stored (in which bits of what registers) and a mapping from logical values to actual stored values. For example, the declaration

```
field LogicalWidth :=
   Control2[5..4] # Offset scaled 8;
```

defines a field LogicalWidth, which is stored in bits 5 and 4 of the Control2 register and the entire Offset register. Additionally, the mapping clause (scaled 8) specifies that the value stored in the register is 1/8th the actual value. The mapping is needed because cards often store a value which is some function of the field's actual value.

**Parameters.** Related to the field declaration, the *parameter* declaration is the definition of a constant value that is either explicit in the specification or read from the card during configuration. An example of the former case would be

param NoClocks := 4;

The majority of a GAL specification consists of the definition of fields for standard values that are used to control the video adaptors and parameters which deetermine certain features of the card (e.g., size of the frame buffer). Table 1 lists some of these predefined field and parameter names that can be defined in GAL specifications.

#### 4.2.2 Clocks

A third major variation between different adaptors is the use of clocks. All adaptors have a clock which controls the frequency at which data is sent to the display. This frequency needs to be changed for different resolutions, and there are two approaches to doing this. One is to have a fixed number of frequencies to choose from, and the other is to have a programmable chip that can generate many frequencies by changing its parameters. The cards with a fixed number of clocks vary in the number of clocks and the frequencies provided, while the cards with a programmable clock vary in how the clock is programmed and its range of frequencies.

A card that has fixed clocks can be specified by defining a parameter NoClocks and a field ClockSelect. The

TABLE 1 Predefined Felds and Params

| Standard field | Purpose                   |
|----------------|---------------------------|
| HTotal,        | Horizontal resolution     |
| HEndDisplay,   | settings.                 |
| HStartBlank,   |                           |
| HEndBlank      |                           |
| VTotal,        | Vertical resolution       |
| VEndDisplay,   | settings.                 |
| VStartBlank,   |                           |
| VEndBlank      |                           |
| LogicalWidth   | Width of virtual screen.  |
| StartAddress   | Display start address.    |
| ClockSelect    | Clock selection.          |
| Standard param | Purpose                   |
| RamSize        | Frame buffer memory size. |
| LinearBase     | Address of linear space.  |
| LinearAperture | Size of linear space.     |
| NoClocks       | Number of fixed clocks.   |

NoClocks constant defines the number of clocks available, and the ClockSelect field defines the field which selects the clock.

For cards that have programmable clocks, a special construct is defined to specify how to program the clock. For example,

clock f3 is 14318 \* f3M / (f3N1 \* f3N2);

defines a clock named £3, which is programmable according to the equation on the right. The equation defines the frequency generated based on programmable values, which are defined elsewhere by the three fields £3M, £3N1, and £3N2. Given the desired clock frequency, the device driver uses the specified equation to find values of £3M, £3N1, and £3N2 which approximate this frequency as closely as possible.

#### 4.2.3 Identification

The fourth major variation observed among video cards is how the card is identified. This information is required for systems which dynamically configure themselves to use whatever card is available at that time. Card identification uses a small number of predicates which test the card and follows a decision tree to decide if the card is supported by the driver and which one.<sup>2</sup> Thus, we define an appropriate construct for specifying this type of decision tree in GAL.

The following is an example of this identification construct.

```
identification begin
```

```
1: writable(Segment) => (true => step 2);
```

```
2: Chip_id => (1 => oti087, others => step 3);
```

```
3: Chip_id2 => (0 => oti037c, 2 => oti067,
5 => oti077);
```

```
end identification;
```

2. One device driver often supports multiple cards from the same vendor.

This example identifies one of four models (oti037c, oti067, oti077, oti087) of cards that use an OTI graphics controller. The construct defines a series of steps numbered 1–3 to the left. At each step, the expression to the left of the arrow is evaluated and the result is compared to the list of decisions on the right. If no decision is matched on the right, then identification fails and indicates that the driver does not support the card. Possible decisions are to identify the card or proceed to another step. Step 2, for example, reads the value of the Chip\_id register, and if the result is 1, identifies that an oti087 is present, otherwise proceeds to step 3 for further tests. The stepwise syntax reflects the way diagnostic procedures are commonly described in manuals.

## 4.2.4 Modes

The final major variation between cards is that many adaptors require some flags be set under certain operating conditions. These are referred to as *modes of operation* in GAL, and are defined with the mode construct. The *mode* construct is used to specify a predicate and a sequence of assignments to fields, which enable or disable the corresponding mode of operation for the video card. For example,

This mode declaration defines a mode, HighRes, which indicates that a '1' must be stored in bit 5 of Control in order to use a video mode in which the horizontal resolution is greater than 800 pixels. In our implementation, the predicate HTotal > 800 is tested after changing the video mode; if it is true, the sequence Control[5] <= 1 is executed.

In addition to user defined modes, there are also a few built-in modes. The built-in modes have fixed predicates, but allow the specification of enabling and disabling sequences. For example, the built-in mode SVGAMode is true for all graphics modes and thus the user-defined enabling sequence is executed each time the mode is changed.

## 4.2.5 Run-Time Variations

In addition to the variations that exist between cards, there are variations within a single driver that depend on conditions not known until run-time (of the driver). For example, some video adaptors operate differently depending on the hardware bus utilized (AT, PCI, or VLB). Additionally, if one wants to build a single device driver for a number of models from the same vendor, the variation between those models has to be chosen at run-time. In GAL, the cases construct is used to describe this type of variation.

As an example, the following statement is used to define the clocks for different models of S3 cards.

```
cases
for S3_TRIO32, S3_TRIO64
  field ClockSelect := Miscr[3..2];
for others
  field ClockSelect := Control[3..0];
end;
```

This example specifies that if the card identified at run-time is a S3\_TRIO32 or S3\_TRIO64, then the card has four fixed clocks selected by bits 3 and 2 of the Miscr field. All other cards have 16 clocks selected by bits 3 down to 0 of the Control field.

## 4.3 Design of GAL

This section discusses some of the many forces that influenced the design of GAL. The first two subsections describe two main inputs to the design process: A definition of variations in the family and knowledge about the domain. In our case, the domain knowledge came from existing documentation used by domain engineers. Other important issues are the level of abstraction, the level of restriction, readability, maintainability, etc. While the level of abstraction and the level of restriction are of particular importance for DSLs, issues like readability and maintainability apply to both DSLs and GPLs.

## 4.3.1 Defining Variations

One of the main inputs to the design of a DSL is a description of the variations that exist among the target set of applications. The defined variations imply requirements on the DSL in order to distinguish among instances of the program family. In our case, these variations came from a study of the documentation of existing video cards. In addition to studying different cards, inspection of the existing device drivers provided a more detailed source of variations at the implementation level. For example, given that there were a small number of ways to communicate, which varied among cards, there must be some construct in GAL specifications, which would allow the selection of the correct communication scheme. Some of this information can also be extracted from the parameters of the abstract machine operations.

## 4.3.2 Domain Knowledge

The other main input to the DSL design process is knowledge of the domain in terms of the abstract objects or concepts and terminology used in the domain. This knowledge may come from a domain expert or from existing natural language specifications, as in our experiment. This is an important input because it leads to a more abstract user-level DSL. An appropriate terminology provides a DSL that is familiar to people of the domain. The identified abstract objects that are affected by variations in the program family provide starting points for declarative constructs.

In this experiment, we looked at several English specifications of video cards to identify the concepts and terminology used within the domain. The clocks, ports, and registers are examples of concepts in the domain that we identified. After identifying them, we considered what attributes of the objects were related to variations within the program family. Declarative statements were then defined to specify the values for the attributes that varied. Thus, the abstract objects identified in our experiment directly translated to declarative constructs in the DSL. Additionally, the relationship between the objects translated into a reference relationship in the DSL. For example, registers are defined by references to port definitions. This may suggest the use of an object-oriented analysis for DSL design.

## 4.3.3 Level of Abstraction

One of the most important goals guiding the DSL is to provide a high-level of abstraction. In particular, we wish to intentionally focus on raising the level of abstraction from the abstract machine level. In fact, it may be desirable to include information in the DSL, which is not even used for implementation, but may be used in analyses or for documentation.

As an example of abstraction, the abstract machine developed for the video device drivers includes operations for doing bitwise shifts and logical operations. However, these types of expressions do not appear in GAL because we intentionally introduced the idea of fields and parameters to eliminate the low-level procedural nature of these expressions. This also eliminates a common source of errors.

After a preliminary design of the language, the language and abstract machine are revised in an iterative way. The revision process must satisfy the correspondence constraint between the language and abstract machine: it must be feasible to provide a mapping from the language to the operations of the abstract machine as an interpreter. During this revision process the level of abstraction must also be considered. Although it is possible to move all of the functionality of the language into the abstract machine, making the mapping essentially one-to-one, there must be conscious decisions made about where to draw the line between the interpreter and the abstract machine. The primary consideration here is the separation of functionality from specification. The abstract machine should specify how applications in the family are implemented. The interpreter, on the other hand, should specify how to make the design decisions required to map a design specification (i.e., DSL program) into an implementation (i.e., abstract machine operators).

## 4.3.4 Level of Restriction

Another major concern is restricting the language. It is important to consider what types of analyses might be performed on specifications in the DSL in order to ensure that the language is restricted enough to make the analyses feasible. For example, in the GAL language we have intentionally not introduced loops, which ensures that all device drivers can be proven to terminate. Additionally, we perform other analyses to detect common errors in the specification by providing explicit information that is difficult or impossible to extract from general purpose languages. An example of this is checking that the bits of each register belong at most to one field. This information could not be retrieved, in general, from a driver implemented in a language such as C.

## 4.3.5 GPL Principles

In addition to the design goals that are specific to DSLs, there are several principles of general purpose language design that also apply to DSL design. General purpose languages can also help DSL design by providing a standard set of constructs that may be restricted for use in the DSL, but would still be recognized as a common construct.

On the other hand, the cases construct introduced in GAL is an interesting example of a construct which possibly has applications in DSLs in general (when a predefined abstraction may, conditionally, have one of several definitions), but is not useful for GPLs, since the behavior is totally described by the program itself and abstractions are explicitly invoked. One of the main purposes of introducing a DSL and an application generator is to embed knowledge about how to implement certain operations of the domain into the application generator. As a result, there are often declarative constructs in DSLs that are translated into executable code by the application generator, which is not generally true of general purpose languages. Since these declarations really imply operations, there is often a need to make choices between the implied operations that can only be made at run-time. This leads to the type of dynamic selection of multiple definitions that is provided by the cases statement. Since a main motivation of utilizing a DSL is to raise the level of abstraction, it will be common for DSLs to have declarative objects which imply operations and require this dynamic selection. Thus, we suspect that this construct will be useful in DSLs in general, and in fact have found it necessary in other DSLs that we have experimented with. This suggests that there are new constructs and principles that are interesting and unique to DSLs and warrant study.

## 5 RESULTS

In this section, we present the results of applying our framework to the domain of video device drivers. The results are presented in terms of the advantages we have gained from using our approach for this family of drivers. There are two aspects of the approach that led to these advantages. One aspect is the use of DSLs and application generators in general, and the second is specific to our framework for application generator design.

## 5.1 Domain Specific Language

The GAL language demonstrates many advantages of using an application generator with a DSL for the video device driver domain. These benefits include an increased level of abstraction, the possibility of automated program analyses, reuse, and productivity.

There are two significant examples of the benefit of a higher level of abstraction. The first, already discussed in Section 4.3.3, is the use of ports, registers, and fields to abstract from the low-level bitwise operations that would otherwise have to be used. This eliminates many common errors, is more readable, and easier to write. A second example is an abstraction from implementation. The X Window server can be considered a framework, where the device driver provides the additional functions. As with any framework, the device driver needs to be implemented in a certain way in order to be compatible with the server and requires considerable knowledge about the framework. Using an application generator, knowledge about the framework and compatibility issues are coded in the application generator, and hidden from the designer.

GAL also demonstrates that automatic analyses can be performed on the DSL, which would not be possible or

feasible with a general purpose language. Example analyses that are performed on GAL specifications include detecting unused definitions, checking for exhaustive identification of video cards, identifying overlap in field definitions, checking for minimum requirements on predefined fields, and generating a card profile (summary of card characteristics). None of these analyses would have been feasible on the existing device drivers implemented in C. Using GAL not only makes the analyses feasible, but also easy to implement. For example, all of these analyses for GAL were implemented within a single day.

One particularly interesting analysis is the one which generates a card profile. Generating a card profile is an analysis which, from the GAL specification, produces a summary of the video modes that are supported by the generated device driver. Fig. 3 shows an extract of the profile generated for the S3 specification listed in Appendix B. A profile is generated for each subset of cards in the specification that have the same profile. The figure shows the profile for the S3\_TRIO64 and S3\_TRIO32. This summary can be compared with vendor specifications to find mistakes in field definitions and provides automatic documentation of the specification.

Finally, using an application generator provides reuse by capturing design knowledge. In the domain of video device drivers there are large benefits of reuse because there is a large growing number of video cards which could potentially be generated from a single application generator. The amount of productivity gained depends on the ease of building the application generator and consequently on the approach to its design. Thus, we discuss productivity measurements in the next section with respect to our framework.

## 5.2 Our Framework

In addition to the advantages obtained from the DSL approach, there are several advantages demonstrated by GAL due to our framework of generator design. The experiment shows that the framework achieves automatic and predictable generation of efficient video drivers, and a high-level of reuse. GAL also demonstrates that the benefits of the two-level approach for analyses and multiple implementations are of practical value.

## 5.2.1 Reuse and Productivity

The abstract machine for X Window device drivers consists of 95 small C procedures totaling 1,200 lines. Implementing the abstract machine has roughly the same difficulty level

```
Profile of S3_TRIO64 + S3_TRIO32
Maximum resolution: 4088x2047
Maximum virtual screen: 3328x2520
(with maximum RAM)
Ram size: 512k-8192k
Clock range: 135-270MHz
Resolution limited to 2304x1728 by
the clock (max. refresh 67Hz).
```

as implementing a single driver directly, as the code is very similar. Since we had existing device driver implementations, some of the abstract machine code could be reused from those drivers.

Table 2 summarizes the number of lines of code in the GAL system in comparison to writing drivers in C. The interpreter for GAL consists of 4,300 lines of C code and an automatically generated parser, much of which concerns building an environment and look-up routines for declarations. Thus, together the system consists of about 5,500 lines of C code. We can compare this to the size of the existing hand-coded drivers which averaged about 1,500 lines. Though the effort required to build an interpreter should be less than that for building a device driver, we can estimate that the application generator requires a little more than 3.5 times the effort of an individual driver (assuming code size proportional to effort).

For the version of the X Window server we used, the existing drivers together consisted of 35,000 lines of code. The GAL specifications that have been written are at least a factor of 9 smaller than the corresponding existing C driver . We can then estimate that these drivers could be generated from less than 4,000 lines of GAL specifications plus the 5,500 lines of the generator, totaling less than 10,000 lines. This is an estimated productivity gain of a factor of 3.5. In practice there would be a higher gain, since GAL specifications are easier to write then the corresponding C driver. In addition, having an interpreter for GAL provides a prototyping environment.

## 5.2.2 Efficiency

Here we consider two measures of efficiency: object code size and execution speed. Although designing an interpreter is easier than designing a compiler, there are significant losses in speed and size (compared to compilation). In terms of speed, interpreters are typically 10–100 times slower than compiled programs, and in terms of size, our GAL interpreter is 10 times larger than a typical driver in object code size. However, a benefit of using partial evaluation is that we can regain the loss in efficiency.

We used Tempo [11], a partial evaluator for C, as the program specializer used to translate GAL specifications to abstract machine programs, and to produce an efficient implementation of the abstract machine programs. In order to make a size comparison, we compared the object file

|       | ٦  | TABLE | 2       |
|-------|----|-------|---------|
| Lines | of | Code  | Summary |

|                | Average lines | Total lines     |
|----------------|---------------|-----------------|
|                | per driver    | for all drivers |
| GAL generator  |               | 5500            |
| GAL driver(s)  |               |                 |
| (estimated)    | 167           | 3900            |
| GAL generator  |               | 9400            |
| plus driver(s) |               |                 |
| C driver(s)    | 1500          | 35000           |

sizes of the generated drivers to that of the hand-coded drivers. On average, the generated driver is only 30 percent larger than the hand-coded one. One main difference that lead to an increase in code size is that the hand-coded drivers often use loops to access a block of contiguous registers. GAL does not recognize when registers are contiguous, although it could. A second difference is that hand-coded drivers are not always careful about saving and restoring all registers.

The speed of most of the device driver functions are insignificant, as they are only called during configuration. However, we picked three device driver functions used for drawing lines and rectangles in hardware to benchmark performance. Since the interpreter level of our framework is guaranteed to be eliminated (see Section 2), we are only concerned with the abstract machine layer.

For comparison, we prepared three versions of the X Window server for an S3 TRIO64V + video card on a Pentium PRO-200. Table 3 shows the timing results for the three servers. The S3 XAA server is the X Window server provided with XFree86 and the included hand-coded S3 device driver. S3 AM is the same server with a device driver which directly uses the abstract machine. Finally, S3 PE is the same server using the abstract machine, but after partial evaluation. The table shows the performance of these servers for lines and filled rectangles of size 10 as measured by the standard XBench benchmark utility.<sup>3</sup> The table also includes a percentage using S3 XAA as a baseline.

The table indicates that there is a loss of about 20 percent in performance from the use of the abstract machine. This loss of performance can be contributed to error checking, interpretation, function call, and data copying overhead. Data copying is due to the need to communicate across abstract machine operations. The write operation includes error checking to ensure that if previous operations fail the resulting data is not written to the card. This is particularly important because the card could otherwise be damaged. Finally, the I/O operations require some interpretation of their parameters to determine the type of I/O to perform and which addresses to use. Although directly using the abstract machine incurs this performance loss, the results for the S3 PE server show that the program transformations performed by partial evaluation are able to recapture all of the performance loss. A majority of the error checking can also be eliminated using Tempo because often the operations preceding write operations cannot fail, and thus error conditions do not need to be checked. Finally, the parameters which are interpreted to select the type of I/O to perform and used for address computation are known and eliminated by Tempo. Tempo also performs inlining and copy elimination which eliminates function call and data copying overhead.

## 5.2.3 Analyses

Our framework for application generator design contributes in two ways to the use of program analyses. The generation process is predictable and can be analyzed, and the

TABLE 3 Performance Results

| Server           | lines/s                 | percent        |
|------------------|-------------------------|----------------|
| S3 XAA           | 189,000                 | 100            |
| S3 AM            | 150,000                 | 79             |
| S3 PE            | $191,\!000$             | 100            |
|                  |                         |                |
| Server           | rectangles/s            | percent        |
| Server<br>S3 XAA | rectangles/s<br>203,000 | percent<br>100 |
|                  | - ,                     | <u>^</u>       |

separation of the abstract machine from the interpreter allows analysis at the abstract machine level.

As an example, the GAL abstract machine includes operations that allocate and deallocate temporary storage and operations which use the temporary storage. As long as the operations which use the temporary storage are only used between a set of allocate and deallocate operations, we can insure there will be no uninitialized pointer dereferences. The analyses of partial evaluation are capable of producing a specification of all the programs that could possibly be generated by the partial evaluation process. From this, we can obtain a formal description of all possible abstract machine programs that could be generated, and can check that the operations are always generated in the correct order. Thus, for the GAL system we can prove that uninitialized pointer dereferences will never occur. This description of the generation process may also be analyzed for performance properties, for example.

The separation of the abstract machine and the DSL provides an intermediate level at which analyses can be performed and could allow analysis at run-time. In fact, this separation corresponds to a standard technique of program specification, which factors the verification process into two parts [3]. As an example of analysis at run-time, we may wish to check that device access within a video driver is safe (e.g., does not access the disk device). This cannot be done until run-time because it depends on what devices are present at run-time. In this case, we might accept video drivers in abstract machine form and analyze the abstract machine at run-time. Partial evaluation can be performed at run-time [12], so the efficiency can still be recaptured. This kind of analysis is not feasible on machine code or even Java bytecodes due to their general purpose nature. In proofcarrying code [24], the burden of proof is put on the programmer and the proof is sent with the code to be verified (verification being easier), whereas here we make the proof easier so that it can be done at run-time.

## 5.2.4 Multiple Implementations

The video device driver family also demonstrates a useful application of having multiple implementations of interpreters and abstract machines. In this domain, it would be desirable to have abstract machines for several architectures and interpreters for different operating systems. For example, Fig. 4 shows the situation where there are implementations of interpreters for Microsoft Windows 95

<sup>3.</sup> A small size is used to ensure measurements are not dominated by hardware operations which are independent of the driver.



Fig. 4. Multiple Implementations.

and Linux/X11, and implementations of the abstract machine for the Dec Alpha and Intel based computers. In this situation, with the equivalent of two application generators (interpreter/abstract machine pairs), the same GAL specification can be used to generate four different device drivers. We have implemented the X11/Intel path of Fig. 4.

For prototyping, we have also benefited from having a second implementation of the abstract machine which simulates the abstract machine operations. The simulation records the values that would be written to the card by the real abstract machine. This is an important feature as some video adaptors can be damaged by writing inappropriate values to the card.

## 6 CONCLUSIONS AND FUTURE WORK

Domain specific languages hold the promise of delivering high payoffs in terms of software reuse, automatic program analysis, and software engineering. In this paper we have presented GAL, an example of a complete DSL for a realistic program family: video device drivers. We also demonstrated the benefits of DSLs by showing how GAL raises the level of abstraction of device driver specifications and identifying some analyses that can be performed on GAL specifications because it is domain specific.

A further contribution of the paper is to validate our framework of application generator design by applying it to this program family to provide an implementation of GAL. Since our implementation is based on partial evaluation, it provides a complete interpreter for prototyping device drivers, but still automatically generates efficient device drivers. Efficiency is demonstrated with results comparing hand-coded drivers to automatically generated device drivers. Generated drivers are roughly one third larger than hand-code drivers and perform equivalently in terms of speed. Additionally, we give measures on expected reuse benefits; GAL specifications are roughly a factor of 9 smaller than a driver hand-coded in C.

The techniques presented in this paper have also been applied to the active networks domain [31]. In this work, we have developed PLAN-P, a DSL for active networks. By using techniques for run-time specialization, we have successfully specialized PLAN-P programs at run time to achieve the effect of a just-in-time compiler (JIT). Experimental results show that the programs produced by the run-time specializer incur no overhead in overall system performance in comparison to using handwritten C code. Furthermore, in comparison to Java, another mobile code approach, the specialized program is twice as fast as an equivalent Java program compiled with an optimizing offline byte-code compiler.

Although our framework significantly reduces the development time of application generators, future work could be done in this direction. Specifically, this approach would benefit from a generator-specific reuse method that would allow interpreters and abstract machines to be constructed from reused composable parts. Additionally, given the nature of DSLs, they are extended frequently to adapt to new program requirements, and the ease of extension also needs to be considered for such language components.

Our implementation of the static analyses indicates that methods of quickly constructing static analyses should also be investigted (e.g., composable anlayses). This is more important for DSLs than GPLs, since static analyses are a major motivation of the approach.

In this work we have presented an application of our approach to a program family with existing family members. To further validate the approach, it is also important to study its application to a program family which is not pre-existing. In this case, the abstract machine and DSL might be developed from the results of a domain analysis or a commonality analysis, such as FAST [15].

## **APPENDIX** A

## A COMPLETE GAL EXAMPLE

Appendix B gives a complete listing of the GAL specification for several models of S3 video adaptors. In this appendix, we explain some of the constructs that were not included in the main text.

Although the various registers of video cards are typically accessed using an addressing scheme, there is sometimes a sequential procedure that must be followed to access some registers. The serial construct is used to specify this kind of procedure (see listing). The construct consists of a list of sequences of actions that should be performed on the ports to access the registers. Thus, multiple ports may be accessed during the procedure, as in the example. Each sequence consists of a port, an operation (<= write, <=> read/write, => read), and a sequence of values for writes or registers names for reads and read/writes. The actions in the sequence are performed from the first port to the last, from left to right in the sequence. The mode (R read, R/W read/write, W write) to the right of the sequence indicates whether this sequence applies to reading the registers to writing the registers or both.

The serial construct in the example defines the registers PLL1, and PLL2. In order to write values to these registers the construct would be executed as follows. Write 3 to misc[3..2], write the value of PLL1 to seq(0x12), write the value of PLL2 to seq(0x13), and finally, write 0, then 1, then 0 to seq(0x15) [5].

The S3 specification also includes an example of a *derived field*, which is not discussed in the paper. This is a field whose value is derived from one of the standard fields. In the example, StartFIFO is a derived field. Its value is set whenever the graphics mode is set, and is based on the value of HTotal, the horizontal resolution. The declaration indicates this with the from clause.

The clockmap is used when a card has both fixed and programmable clocks such as the S3 Trio cards. It indicates which clocks are fixed and which are programmable. The example for the S3 indicates that clock 0 and 1 are fixed, clock 2 is not available (NA), and clock 3 is the programmable clock f3. The parameters MinPClock and MaxP-Clock are also related to clocks and specify the minimum and maximum values that can be generated by the clock (i.e., not all values of f3M, f3N1, and f3N2 are valid).

Finally, the operating mode access is used to lock an unlock registers on the card.

## APPENDIX B GAL S3 LISTING

```
-- List all cards/models supported by this
  driver.
chipsets
  S3_911, S3_924, S3_80x, S3_928, S3_864,
  S3_964, S3_866, S3_868, S3_968, S3_TRIO32,
  S3_TRIO64;
-- Define ports.
port svga indexed := 0x3d4;
port seq indexed := 0x3c4;
port misc := 0x3cc, 0x3c2;
--Define registers.
register Miscr := misc;
register Slock := seq(0x8);
register Offset := svga(0x13);
register ExtChipID := svga(0x2e);
register ChipID := svga(0x30);
register Memory := svga(0x31);
register State := svga(0x36);
register Lock1 := svga(0x38);
register Lock2 := svga(0x39);
register StartFIFOr := svga(0x3B);
register Misc1 := svga(0x3a);
register Control := svga(0x42);
register Control2 := svga(0x51);
register HOverflow := svga(0x5D);
register VOverflow := svga(0x5E);
```

register Control3 := svga(0x69);

```
--Serial registers (see Appendix A).

serial begin

misc[3..2] <= (3,- ,-,-,-) W;

seq(0x12) <=> (-, PLL1, -, -, -) R/W;

seq(0x13) <=> (-, PLL2, -, -, -) R/W;

seq(0x15)[5] <= (-,- ,0,1,0) W;

end;
```

-- Define predefined fields

```
-- Horizontal resolution fields.
field HTotal := HOverflow[0]#std;
field HEndDisplay := HOverflow[1]#std;
field HStartBlank := HOverflow[2]#std;
field HStartRetrace := HOverflow[4]#std;
```

```
--Vertical resolution fields.
field VTotal := VOverflow[0]#std;
field VEndDisplay := VOverflow[1]#std;
field VStartBlank := VOverflow[2]#std;
field VStartRetrace := VOverflow[4]#std;
--Virtual screen fields.
  field LogicalWidth :=
   Control2[5..4] #Offset scaled 8;
cases
for S3_928, S3_968, S3_TRIO32, S3_TRIO64
  field StartAddress :=
   Control2[1..0] #Memory[5..4] #std;
for S3 80x
  field StartAddress :=
   Control2[0] #Memory [5..4] #std;
for S3_864, S3_964
  field StartAddress := Control3[4..0]#std;
for others
  field StartAddress := Memory[5..4]#std;
end:
-- Define derived fields (see Appendix A).
```

```
ield StartFIFO from HTotal :=
    HOverflow[6]#StartFIFOr offset 10 scaled 8;
```

--Special S3 flags that must be set for 256 color graphics modes. enable SVGAMode sequence is Misc1[4] <= 1, Memory[3] <= 1;</pre>

```
--Define standard parameters.
param TwoBankRegisters := false;
param InterlaceDivide := true;
```

```
cases
for S3_911, S3_924
  param RamSize := State[5] mapped
    (0 => 1024,1 => 512);
for others
  param RamSize := Se[7..5] mapped
    (0 => 4096,2 => 3072> 8192, 4 => 2048,
    5 => 5120, 6 => 1024,7 => 512);
end;
```

-- Define clocks.

```
cases
for S3_TRIO32,S3_TRIO64
  param NoClocks := 4;
  field ClockSelect := Miscr[3..2];
  param MinPClock := 135;
  param MaxPClock := 270;
  field f3M := PLL2[6..0] offset 2
    range 1 to 127;
  field f3N1 := PLL1[4..0] offset 2
    range 1 to 31;
  field f3N2 := PLL1[6..5] mapped
  (0 => 1, 1 => 2, 2 => 4, 3 => 8);
  clock f3 is 14318 * f3M / f3N1 * f3N2;
```

```
clockmap is (fixed, fixed, NA, f3);
for others
param NoClocks := 16;
field ClockSelect := Control[3..0];
```

```
end;
```

```
-- Identification procedure.
identification begin
1: ChipID[7..4] =>
  (0x8 => step 2, 0x 9 => S3_928,
    0xA => S3_80x, 0xB => S3_928,
    0xC => S3_864, 0xD => S3_964,
    0xE => step 3);
2: ChipID[1..0] => (0x1 => S3_911,
    0x2 => S3_924);
3: ExtChipID =>
    (0x10 => S3_TRIO32, 0x11 => S3_TRIO64,
    0x80 => S3_866, 0x90 => S3_868,
    0xB0 => S3_968);
end;
```

-- Register locks on S3 chips. enable access sequence is Lock1 <= 0x48, Lock2 <= 0xA5, Slock <= 0x6; disable access sequence is Lock1 <= 0x00, Lock2 <= 0x5A, Slock <= 0x0;</p>

#### ACKNOWLEDGMENTS

This work has been partially supported by France Telecom under Contract No. CNET 96-1B-027 and by Defense Advanced Research Projects Agency under Contract No. F19628-95-C-0193.

#### REFERENCES

- B.R.T. Arnold, A. van Deursen, and M. Res, "An Algebraic Specification of a Language Describing Financial Products," *Proc*, *IEEE Workshop Formal Methods Application in Software Eng.*, pp. 6– 13, Apr. 1995.
- [2] J. Bentley, "Programming Pearls: Little Languages," Comm. ACM, pp. 711–716, Aug. 1986.
- [3] H.K. Berg, W.E. Boebert, W.R. Franta, and T.G. Moher, Formal Methods of Program Verification and Specification. Englewood Cliffs, N.J: Prentice Hall, 1982.
- [4] J.A. Bergstra and P. Klint, "The ToolBus Coordination Architecture," Proc. First Int'l Conf. Coordination and Models, Cesena, Italy, Lecture Notes in Computer Science, vol. 1,061, pp. 75–88, 1996.
- [5] E. Bjarnason, "Applab: A Laboratory for Application Languages," L. Bendix, K. Nrmark, and K. Sterby, eds., Proc. Nordic Workshop Programming Environment Research, Technical Report R-96-2019, Aalborg Univ., May 1996.
- [6] G. Booch, Software Components with Ada. Benjamin/Cummings, 1987.

- [7] J. Bosch and G. Hedin, eds., Proc. Workshop Compiler Techniques for Application Domain Languages and Extensible Language Models, Linköping, Technical Report 96-173, Lund Univ., Apr. 1996.
- [8] S. Chandra and J. Larus, "Experience with a Language for Writing Coherence Protocols," Proc. First USENIX Conf. Domain-Specific Languages, Santa Barbara, Calif., Oct. 1997.
- [9] J.G. Cleaveland, "Building Application Generators," IEEE Software, July 1988.
- [10] C. Consel, L. Hornof, J. Lawall, R. Marlet, G. Muller, J. Noyé, S. Thibault, and N. Volanschi, "Tempo: Specializing Systems Applications and Beyond," ACM Computing Surveys, Symp. Partial Evaluation, 1988, to appear.
- [11] C. Consel, L. Hornof, F. Nol, J. Noy, and E.N. Volanschi, "A Uniform Approach for Compile-Time and Run-Time Specialization," Danvy et al., eds., *Partial Evaluation Int'l Seminar, Dagstuhl Castle*, pp. 54-72, Feb. 1996.
- [12] C. Consel and F. Nol, "A General Approach for Run-Time Specialization and Its Application to C," Proc. Conf. Record of the 23rd ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages, pp. 145–156, St. Petersburg Beach, Fla., Jan. 1996.
- [13] O. Danvy, R. Glck, and P. Thiemann, eds., *Partial Evaluation*, *International Seminar*, Dagstuhl Castle, Lecture Notes in Computer Science 1,110, Feb. 1996.
- [14] C. Elliott, "Modeling Interactive 3D and Multimedia Animation with an Embedded Language," Proc. First USENIX Conf. Domain-Specific Languages, Santa Barbara, Calif., Oct. 1997.
- [15] N.K. Gupta, L.J. Jagadeesan, E.E. Koutsofios, and D.M. Weiss, "Auditdraw: Generating Audits the Fast Way," *Proc. Third IEEE Symp. Requirements Eng.*, pp. 188–197, Jan. 1997.
- [16] N.D. Jones, "An Introduction to Partial Evaluation," ACM Computing Surveys, vol. 28, no. 3, pp. 480–503, Sept. 1996.
- [17] N.D. Jones, "What Not to do When Writing an Interpreter for Specialisation," Danvy et al., eds., Proc. Partial Evaluation International Seminar, Dagstuhl Castle, pp. 216-237, Feb. 1996.
- [18] N.D. Jones, C. Gomard, and P. Sestoft, *Partial Evaluation and Automatic Program Generation*, Int'l Series in Computer Science, Englewood Cliffs, N.J.: Prentice Hall, June 1993.
- [19] S. Kamin and D. Hyatt, "A Special-Purpose Language for Picture-Drawing," Proc. First USENIX Conf. Domain-Specific Languages Santa Barbara, Calif., Oct. 1997.
- [20] R. Kieburtz, L. McKinney, J. Bell, J. Hook, A. Kotov, J. Lewis, D. Oliva, T. Sheard, I. Smith, and L. Walton, "A Software Engineering Experiment in Software Component Generation," *Proc. 18th IEEE Int'l Conf. Software Eng, ICSE-18*, pp. 542–553, 1996.
- [21] D. Ladd and C. Ramming, "Two Application Languages in Software Production," USENIX Symp. Very High Level Languages New Mexico, Oct. 1994.
- [22] R. Marlet, S. Thibault, and C. Consel, "Mapping Software Architectures to Efficient Implementations via Partial Evaluation," *Proc. Conf. Automated Software Eng.*, pp. 183–192, Lake Tahoe, Nev., IEEE Computer Society, Nov. 1997.
- [23] R. McCain, "Reusable Software Component Construction: A Product-Oriented Paradigm," Proc. Fifth AiAA/ACM/NASA/IEEE Computers in Aerospace Conf., Long Beach, Calif., Oct. 1985.
- [24] G. Necula, "Proof-Carrying Code," Conf. Record 24th Symp. Principles of Programming Languages, pp. 106–116, Paris, ACM Press, Jan. 1997.
- [25] J. Neighbors, "Software Construction Using Components," PhD thesis, Univ. of Calif. at Irvine, 1980.
- [26] G.D. Plotkin, "A Structural Approach to Operational Semantics," Univ. of Aarhus, Aarhus, Denmark, 1981.
- [27] R. Prieto-Díaz, "Domain Analysis: An Introduction," Software Eng. Notes, vol. 15, no. 2, Apr. 1990.
- [28] C. Pu, A. Black, C. Cowan, J. Walpole, and C. Consel, "Microlanguages for Operating System Specialization," Proc. First ACM-SIGPLAN Workshop Domain-Specific Languages, wDSL'97, Paris, Jan. 1997.
- [29] T. Romer, D. Lee, G. Voelker, A. Wolman, W. Wong, J. Baer, B. Bershad, and H. Levy, "The Structure and Performance of Interpreters," *Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems*, pp. 150–159, Oct. 1996.
- [30] S. Thibault and C. Consel, "A Framework for Application Generator Design," Proc. Symp. Software Reusability, Boston, Mass., May 1997.

- [31] S. Thibault, C. Consel, and G. Muller, "Safe and Efficient Active Network Programming," *Proc. 17th IEEE Symp. Reliable Distributed Systems*, West Lafayette, Ind., Oct. 1998.
- [32] S. Thibault, J. Marant, and G. Muller, "Adapting Distributed Applications Using Extensible Networks," Proc. 17th IEEE Conf. Distributed Computing Systems, Austin, Texas, May 1999.
- [33] A. van Deursen and P. Klint, "Little Languages: Little Maintenance?" Proc. First ACM-SIGPLAN Workshop Domain-Specific Languages, wDSL'97, Jan. 1997.
- [34] First ACM-SIGPLAN Workshop Domain-Specific Languages, Paris, France, Computer Science Technical Report, Univ. of Illinois at Urbana-Champaign, Jan. 1997.
- [35] B.W. Weide and W.F. Ogden, "Recasting Algorithms to Encourage
- Reuse," IEEE Software, vol. 11, no. 5, Sept. 1994.
- [36] The XFree86 Project, http://www.xfree86.org/.



Scott A. Thibault received the BS degree in computer science from the University of Vermont in 1993; the MS degree in computer science from the University of Missouri at Rolla in 1995; and the PhD degree in computer science from the Université de Rennes I in 1998. He is currently founder and president of Green Mountain Computing Systems, Inc. His current research interests include the development and application of domain-specific languages, net-

work security, and electronic-design automation. He has published four conference papers in the area of domain-specific languages.



Renaud Marlet received his PhD degree in computer science from the University of Nice-Sophia Antipolis, France, in 1994. He is a research associate scientist in the Compose group at IRISA/INRIA-Rennes, France. His work focuses on two facets of program adaptation, i.e., the ability for a program to adapt to the context in which it is used (using partial evaluation and specialization declarations) and the adaptation of programming to specific

domains or application families (by means of adaptable software components and domain-specific languages). He is one of the codevelopers of Tempo, a partial evaluator for C.



Charles Consel received his PhD degree from the University of Paris 6 in 1989. He worked as a research faculty member for three years at Yale University until 1992. He then spent one year at Oregon Graduate Institute as an assistant professor. He is now a professor of cmputer science at the University of Rennes 1. He leads the Compose group at IRISA/INRIA. His group studies partial evaluation, a program transformation approach aimed at specializing programs

with respect to given execution contexts. This work has been carried out in practice with a program specializer for C called Tempo. This system has been successfully used in various applications, such as operating systems and scientific code. A complementary research project is domain specific languages: a software engineering approach that provides high productivity, easy maintenance and improved safety (without giving up performance, thanks to partial evaluation). His other research interests include semantic program analysis, compilation and compiler generation, programming environment, prototyping, program transformation, and formal specification. His work on programming languages, software engineering, and operating systems has led to many publications in major conferences and journals, such as: POPL, PLDI, OOPSLA, SOSP, ASE, TOPLAS, ACM Surveys, and so on.