"The Pipes and Filters architectural pattern provides a structure for systems that process a stream of data. Each processing step is encapsulated in a filter component. Data [are] passed through pipes between adjacent filters. Recombining filters allows you to build families of related filters." [Buschmann]
The context consists of programs that must process streams of data.
Suppose we need to build a system to solve a problem:
The design of the components and their interconnections must consider the following forces [Buschmann]:
The filters are the processing units of the pipeline. A filter may enrich, refine, or transform its input data [Buschmann].
A filter may be active (the more common case) or passive.
The pipes are the connectors--between a data source and the first filter, between filters, and between the last filter and a data sink. As needed, a pipe synchronizes the active elements that it connects together.
A data source is an entity (e.g., a file or input device) that provides the input data to the system. It may either actively push data down the pipeline or passively supply data when requested, depending upon the situation.
A data sink is an entity that gathers data at the end of a pipeline. It may either actively pull data from the last filter element or it may passively respond when requested by the last filter element.
See the Class-Responsibility-Collaborator (CRC) cards for these elements on page 56 of the Buschmann book.
Implementation of the pipes-and-filters architecture is usually not difficult. It often includes the following steps [Buschmann]:
Each step should only depend upon the outputs of the previous step in the sequence. The steps will become the filters in the system.
In dividing up the functionality, be sure to consider variations or later changes that might be needed--a reordering of the steps or substitution of one processing step for another.
For example, Unix pipes carry an unstructured sequence of bytes. However, many Unix filters read and write streams of ASCII characters that are structured into lines (with the newline character as the line terminator).
Another important formatting issue is how the end of the input is marked. A filter might rely upon a system end-of-input condition or it may need to implement their own "sentinel" data value to mark the end.
For example, a pipe connecting active filters might be implemented with operating system or programming language runtime facility such as a message queue, a Unix-style pipe, or a synchronized-access bounded buffer.
A pipe connecting to a passive filter might be implemented as a direct call of the adjacent filter: a push connection as a call of the downstream filter as a procedure or a pull connection as a call of the upstream filter as a function.
The design of a filter is based on the nature of the task to be performed and the natures of the pipes to which it can be connected.
The selection of the size of the buffer inside a pipe is an important performance tradeoff. Large buffers may use up much available memory but likely will involve less synchronization and context-switching overhead. Small buffers conserve memory at the cost of increased overhead.
To make filters flexible and, hence, increase their potential reusability, they often will need different processing options that can be set when they are initiated. For example, Unix filters often take command line parameters, access environment variables, or read initialization files.
Error handling is difficult in a pipes-and-filters system since there is no global state and often multiple asynchronous threads of execution. At the least, a pipes-and-filters system needs mechanisms for detecting and reporting errors. An error should not result in incorrect output or other damage to the data.
For example, a Unix program can use the stderr
channel to
report errors to its environment.
More sophisticated pipes-and-filters systems should seek to recover from errors. For example, the system might discard bad input and resynchronize at some well-defined point later in the input data. Alternatively, the system might back up the input to some well-defined point and restart the processing, perhaps using a different processing method for the bad data.
One approach is to use a standardized main program to create, connect, and initiate the needed pipe and filter elements of the pipeline.
Another approach is to use an end-user tool, such as a command shell or a visual pipeline editor, to create, connect, and initiate the needed pipe and filter elements of the pipeline.
An example pipes-and-filter system might be a retargetable compiler for a programming language. The system might consist of a pipeline of processing elements similar to the following:
Note: This element handles context-sensitive syntactic issues such as type checking and type conversion in expressions.
Note: A global optimizer may transform the program by operations such as factoring out common subexpressions and moving statements outside of loops.
Note: A local optimizer may transform the program by removing unneeded loads and stores of data.
The pipeline can be reconfigured to support a number of different variations:
Note: If the parser is driven by tables that describe the grammar, then it may be possible to use the same parser with a different table.
Of course, a pure active-filters system as described above for a compiler may not be very efficient or convenient.
In the compiler pipeline, the symbol table is a key component of the global state that is constructed by the lexical analyzer and needed by the phases downstream through (at least) the intermediate code generator.
In the compiler pipeline, it may be useful to combine the phases from lexical analysis through intermediate code generation into one program because they share the symbol table. Performance can be further improved by having the parser directly call the lexical analyzer when the next token is needed.
For example, the symbol table information is not usually required during backend code generation, interpretation, or execution. However, some of the symbol table information, such as variable and procedure names, may be useful in generation of error messages and execution traces or for use by a runtime debugging tools.
So far we have focused on single-input single-output filters. A generalization of the pipes-and-filters pattern allows filters with multiple input and/or multiple output pipes to be connected in any directed graph structure.
In general, such dataflow systems are difficult to design so that they compute the desired result and terminate cleanly. However, if we restrict ourselves to directed acyclic graph structures, the problem is considerably simplified.
In the UNIX operating system shell, the tee
filter
provides a mechanism to split a stream into two streams, named pipes
provide mechanisms for constructing network connections, and filters
with multiple input files/streams provide mechanisms for joining two
streams.
Consider the following UNIX shell commands. On a Solaris machine, this sequence sets up a pipe to build a sorted list of all words that occur more than once in a file:
# create two named pipes mknod pipeA p mknod pipeB p # set up side chain computation (running in the background) cat pipeA >pipeB & # set up main pipeline computation cat filename | tr -cs "[:alpha:]" "[\n*256]" \ | tr "[:upper:]" "[:lower:]" | sort | tee pipeA | uniq \ | comm -13 - pipeB | uniq
mknod
commands set up two named pipes,
pipeA
and pipeB
, for connecting to a "side
chain" computation.
cat
program
running in a background fork (note the
&
). The program takes its input from the pipe named
pipeA
and writes its output to the pipe named
pipeB
.
cat
filter as a source for
the stream. The next two stages use filter tr
to
translate each sequence of non-alphabetic characters to a single
newline character and to map all uppercase characters to lowercase,
respectively. The words are now in a standard form--in lowercase, one
per line.
sort
filter.
tee
filter
to replicate the stream, sending one copy down the main pipeline and
another copy onto the side chain via pipeA
.
pipeA
onto pipeB
. Meanwhile the main pipeline uses the
uniq
filter to remove adjacent duplicate words.
comm
filter. The comm
filter
takes two inputs, one from main pipeline's stream (note the
-
parameter) and another from pipeB
.
comm
filter with the -13
option cause it to output the lines that appear in the second stream
(i.e., pipeB
) but not the first stream (i.e., the main
pipeline). Thus, the output is an alphabetical list of words that
appear more than once in the input file.
uniq
filter, removes
duplicates from the final output.
The pipes-and-filters architectural pattern has the following benefits [Buschmann]:
The pipes-and-filters architectural pattern has the following liabilities [Buschmann]:
sort
, can become the bottleneck of a system.
The first version of these notes were written during the spring 1998 semester for the Special Topics in Software Architecture class (offered as ENGR 691).
UP to CSCI 581 Lecture Notes root document?