Notes on Software Patterns:
Pipes and Filters
Architectural Pattern

H. Conrad Cunningham

16 April 2022

Copyright (C) 1998-2022, H. Conrad Cunningham
Professor of Computer and Information Science
University of Mississippi
214 Weir
P.O. Box 1848
University, MS 38677
(662) 915-7396 (dept. office)

Browser Advisory: The HTML version of this textbook requires a browser that supports the display of MathML. A good choice as of April 2022 is a recent version of Firefox from Mozilla.

2 Pipes and Filters Architectural Pattern

2.1 Chapter Introduction

The goal of this chapter is to present the Pipes and Filter architectural design pattern. This presentation follows the presentation in Buschmann et al. [1].

The accompanying set of Powerpoint slides (not fully updated in 2022) is the following:

2.2 Definition

Buschmann et al. [1] defines the Pipes and Filters architectural pattern as follows:

“The Pipes and Filters architectural pattern provides a structure for systems that process a stream of data. Each processing step is encapsulated in a filter component. Data are passed through pipes between adjacent filters. Recombining filters allows you to build families of related filters.”

2.3 Context

The context consists of programs that must process streams of data.

2.4 Problem

Suppose we need to build a system to solve a problem:

The design of the components and their interconnections must consider the following forces [1]:

2.5 Solution

The Solution involves the following steps:

2.6 Structure

The filters are the processing units of the pipeline. A filter may enrich, refine, or transform its input data [1].

A filter may be active (the more common case) or passive.

The pipes are the connectors—between a data source and the first filter, between filters, and between the last filter and a data sink. As needed, a pipe synchronizes the active elements that it connects together.

A data source is an entity (e.g., a file or input device) that provides the input data to the system. It may either actively push data down the pipeline or passively supply data when requested, depending upon the situation.

A data sink is an entity that gathers data at the end of a pipeline. It may either actively pull data from the last filter element or it may passively respond when requested by the last filter element.

(See the Class-Responsibility-Collaborator (CRC) cards for these elements on page 56 of Buschmann et al. [1])

TODO: Consider whether the above paragraph is needed or can be replaces without depending upon the book completely.

2.7 Implementation

Implementation of the pipes-and-filters architecture is usually not difficult. It often includes the following steps [1]:

  1. Divide the functionality of the problem into a sequence of processing steps.

    Each step should only depend upon the outputs of the previous step in the sequence. The steps will become the filters in the system.

    In dividing up the functionality, be sure to consider variations or later changes that might be needed—a reordering of the steps or substitution of one processing step for another.

  2. Define the type and format of the data to be passed along each pipe.

    For example, Unix pipes carry an unstructured sequence of bytes. However, many Unix filters read and write streams of ASCII characters that are structured into lines (with the newline character as the line terminator).

    Another important formatting issue is how the end of the input is marked. A filter might rely upon a system end-of-input condition or it may need to implement their own “sentinel” data value to mark the end.

  3. Determine how to implement each pipe connection.

    For example, a pipe connecting active filters might be implemented with operating system or programming language runtime facility such as a message queue, a Unix-style pipe, or a synchronized-access bounded buffer.

    A pipe connecting to a passive filter might be implemented as a direct call of the adjacent filter: a push connection as a call of the downstream filter as a procedure or a pull connection as a call of the upstream filter as a function.

  4. Design and implement the filters.

    The design of a filter is based on the nature of the task to be performed and the natures of the pipes to which it can be connected.

    • An active filter needs to run with its own thread of control. It might run as as a “heavyweight” operating system process (i.e., having its own address space) or as a “lightweight” thread (i.e., sharing an address space with other threads).

    • A passive filter does not require a separate thread of control (although it could be implemented with a separate thread).

    The selection of the size of the buffer inside a pipe is an important performance tradeoff. Large buffers may use up much available memory but likely will involve less synchronization and context-switching overhead. Small buffers conserve memory at the cost of increased overhead.

    To make filters flexible and, hence, increase their potential reusability, they often will need different processing options that can be set when they are initiated. For example, Unix filters often take command line parameters, access environment variables, or read initialization files.

  5. Design for robust handling of errors.

    Error handling is difficult in a pipes-and-filters system since there is no global state and often multiple asynchronous threads of execution. At the least, a pipes-and-filters system needs mechanisms for detecting and reporting errors. An error should not result in incorrect output or other damage to the data.

    For example, a Unix program can use the stderr channel to report errors to its environment.

    More sophisticated pipes-and-filters systems should seek to recover from errors. For example, the system might discard bad input and resynchronize at some well-defined point later in the input data. Alternatively, the system might back up the input to some well-defined point and restart the processing, perhaps using a different processing method for the bad data.

  6. Configure the pipes-and-filters system and initiate the processing.

    One approach is to use a standardized main program to create, connect, and initiate the needed pipe and filter elements of the pipeline.

    Another approach is to use an end-user tool, such as a command shell or a visual pipeline editor, to create, connect, and initiate the needed pipe and filter elements of the pipeline.

2.8 Example

An example pipes-and-filter system might be a retargetable compiler for a programming language. The system might consist of a pipeline of processing elements similar to the following:

  1. A source element reads the program text (i.e., source code) from a file (or perhaps a sequence of files) as a stream of characters.

  2. A lexical analyzer converts the stream of characters into a stream of lexical tokens for the language—keywords, identifier symbols, operator symbols, etc.

  3. A parser recognizes a sequence of tokens that conforms to the language grammar and translates the sequence to an abstract syntax tree.

  4. A “semantic” analyzer reads the abstract syntax tree and writes an appropriately augmented abstract syntax tree.

    Note: This element handles context-sensitive syntactic issues such as type checking and type conversion in expressions.

  5. A global optimizer (usually optionally invoked) reads an augmented syntax tree and outputs one that is equivalent but corresponds to program that is more efficient in space and time resource usage.

    Note: A global optimizer may transform the program by operations such as factoring out common subexpressions and moving statements outside of loops.

  6. An intermediate code generator translates the augmented syntax tree to a sequence of instructions for a virtual machine.

  7. A local optimizer converts the sequence of intermediate code (i.e., virtual machine) instructions into a more efficient sequence.

    Note: A local optimizer may transform the program by removing unneeded loads and stores of data.

  8. A backend code generator translates the sequence of virtual machine instructions into a sequence of instructions for some real machine platform (i.e., for some particular hardware processor augmented by operating system calls and a runtime library).

  9. If the previous step generated symbolic assembly code, then an assembler is needed to translate the sequence of symbolic instructions into a relocatable binary module.

  10. If the previous steps of the pipeline generated a sequence of separate binary modules, then a linker might be needed to bind the separate modules with library modules to form a single executable (i.e., object code) module.

  11. A sink element outputs the resulting binary module into a file.

The pipeline can be reconfigured to support a number of different variations:

Of course, a pure active-filters system as described above for a compiler may not be very efficient or convenient.

2.9 Variants

So far we have focused on single-input single-output filters. A generalization of the pipes-and-filters pattern allows filters with multiple input and/or multiple output pipes to be connected in any directed graph structure.

In general, such dataflow systems are difficult to design so that they compute the desired result and terminate cleanly. However, if we restrict ourselves to directed acyclic graph structures, the problem is considerably simplified.

In the UNIX operating system shell, the tee filter provides a mechanism to split a stream into two streams, named pipes provide mechanisms for constructing network connections, and filters with multiple input files/streams provide mechanisms for joining two streams.

TODO: Bring the following up to date?

Consider the following UNIX shell commands. On a Solaris “Unix” machine (late 1990’s), this sequence sets up a pipe to build a sorted list of all words that occur more than once in a file:

    # create two named pipes
    mknod pipeA p
    mknod pipeB p
    # set up side chain computation (running in the background)
    cat pipeA >pipeB &
    # set up main pipeline computation
    cat filename | tr -cs "[:alpha:]" "[\n*256]" \
                 | tr "[:upper:]" "[:lower:]" | sort | tee pipeA | uniq \
                 | comm -13 - pipeB | uniq 

2.10 Consequences

2.10.1 Benefits

The pipes-and-filters architectural pattern has the following benefits [1]:

  • Intermediate files unnecessary, but possible. File system clutter is avoided and concurrent execution is made possible.

  • Flexibility by filter exchange. It is easy to exchange one filter element for another with the same interfaces and functionality.

  • Flexibility by recombination. It is not difficult to reconfigure a pipeline to include new filters or perhaps to use the same filters in a different sequence.

  • Reuse of filter elements. The ease of filter recombination encourages filter reuse. Small, active filter elements are normally easy to reuse if the environment makes them easy to connect.

  • Rapid prototyping of pipelines. Flexibility of exchange and recombination and ease of reuse enables the rapid creation of prototype systems.

  • Efficiency by parallel processing. Since active filters run in separate processes or threads, pipes-and-filters systems can take advantage of a multiprocessor.

2.10.2 Liabilities

The pipes-and-filters architectural pattern has the following liabilities [1]:

  • Sharing state information is expensive or inflexible. The information must be encoded, transmitted, and then decoded.

  • Efficiency gain by parallel processing is often an illusion. The costs of data transfer, synchronization, and context switching may be high. Non-incremental filters, such as the Unix sort, can become the bottleneck of a system.

  • Data transformation overhead. The use of a single data channel between filters often means that much transformation of data must occur, for example, translation of numbers between binary and character formats.

  • Error handling. It is often difficult to detect errors in pipes-and-filters systems. Recovering from errors is even more difficult.

2.11 What Next?

TODO

2.12 Acknowledgements

I wrote the first version of these notes for my Spring 1998 graduate Special Topics in Software Architecture class. I based the notes, in part, on:

I revised the notes somewhat for related courses in 1999, 2000, 2001, 2002, and 2004. Also, in 2004 I revised the notes, created slides, and included them as a part of the materials supported by a grant from the Acxiom Corporation titled “The Acxiom Laboratory for Software Architecture and Component Engineering (ALSACE)”. My ALSACE research team included PhD students Yi Liu and Pallavi Tadepalli and MS students Mingxian Fu and “Melody” Hui Xiong.

In Spring 2017, I adapted the earlier notes to use Pandoc-flavored Markdown. In Spring 2018 I revised the notes slightly to fit in with the other documents for the CSci 658 course.

I retired from the full-time faculty in May 2019. As one of my post-retirement projects, I am continuing work on possible textbooks based on the course materials I had developed during my three decades as a faculty member. In January 2022, I began refining the existing content, integrating separately developed materials together, reformatting the documents, constructing a unified bibliography (e.g., using citeproc), and improving my build workflow and use of Pandoc.

TODO: Modify acknowledgements appropriately for 2022+ updates.

I maintain this chapter as text in Pandoc’s dialect of Markdown using embedded LaTeX markup for the mathematical formulas and then translate the document to HTML, PDF, and other forms as needed

2.13 Terms and Concepts

TODO

2.14 References

[1]
Frank Buschmann, Regine Meunier, Hans Rohnert, Peter Sommerlad, and Michael Stal. 1996. Pattern-oriented software architecture: A system of patterns. Wiley, Hoboken, New Jersey, USA.
[2]
Mary Shaw and David Garlan. 1996. Software architecture: Perspectives on an emerging discipline. Prentice-Hall, Englewood Cliffs, New Jersey, USA.