The most effective weapon that computing scientists have in their fight against complexity is abstraction.
Sometimes abstraction is described: remembering the "what" and ignoring the "how".
Large complex systems can only be made understandable by decomposing them into modules. When viewed from the outside, each module should be simple, with the complexity hidden inside. We strive for modules that have simple interfaces that can be used without knowing the implementations.
Two kinds of abstraction are of interest to computing scientists: procedural abstraction and data abstraction.
When we develop an algorithm following the top-down approach, we are practicing procedural abstraction. At a high level, we break the problem up into several tasks. We give each task a name and state its requirements, but we do not worry about how the task is to be accomplished until we expand it at a lower level of our design.
When we code a task in a programming language, we will typically make each task a procedure (or function). Any other program component that calls the procedure needs to know its interface (name, parameters, assumptions, etc.) but does not need to know the procedure's internal implementation details. The internal implementation can be changed without affecting the caller.
In data abstraction, the focus is on the problem's data rather than the tasks to be carried out. We examine the concepts of data abstraction in the sections that follow.
In Pascal, all data structures are visible. A programmer can define custom data types, yet their structure is known to other parts of the program. These are concrete data structures.
As an example, consider a collection of records about the employees of a company. Suppose we store these records in a global Pascal array. The array and all its elements are visible to all parts of the program. Any statement in the program can directly access and modify the elements of the array.
An abstract data structure is a module consisting of data and operations. The data are hidden within the module and can only be accessed by means of the operations. The data structure is called abstract because its name and its interface are known, but not its implementation.
Abstract data structures support information hiding. Their implementation is hidden behind an interface that remains unchanged, even if the implementation changes.
The concept of encapsulation is related to the concept of information hiding. The data and the operations that manipulate the data are all combined in one place. That is, they are encapsulated within a module.
An abstract data structure has a state that can be manipulated by the operations. The state is a value, or collection of information, held by the abstract data structure.
As an example, again consider the collection of records about the employees of a company. Suppose we impose a discipline on our Pascal program, only allowing the collection of records to be accessed through a small group of procedures. Inside this group of procedures, the array of records can be manipulated directly. However, all other parts of the program must use one of the procedures in the group to manipulate the records in the collection. The fact that the collection is implemented with an array is (according to the discipline we imposed) hidden behind the interface provided by the group of procedures.
Now suppose we wish to modify our program and change the implementation from an array to a linked list or maybe to move the collection to a disk file. By approaching the design of the collection as an abstract data structure, we have limited the parts of the program that must be changed to the small group of procedures that used the array directly; other parts of the program are not affected.
As another example of an abstract data structure, consider a stack.
We provide operations like push
, pop
, and
empty
to allow a user of the stack to access and
manipulate it. Except for the code implementing these operations, we
disallow direct access to the concrete data structure that implements
the stack. The implementation might use an array, a linked list, or
some other concrete data structure; the actual implementation is
"hidden" from the user of the stack.
There is only one instance of an abstract data structure. Often we need to create multiple instances of a data structure. For example, we might need to have a collection of employee records for each different department within a large company.
We need to go a step beyond the abstract data structure and define an abstract data type (ADT).
What do we mean by type?
Consider the built-in type INTEGER
in Pascal. By
declaring a Pascal variable to be of type INTEGER
, we are
specifying that the variable has the characteristics of that type:
INTEGER
, a subset of the
mathematical set of integers,
INTEGER
, addition, multiplication, comparison for
equality, etc.
Suppose we declare a Pascal variable to have type
INTEGER
. By that declaration, we are creating a
container in the program's memory that, at any point in time, holds a
single value drawn from the INTEGER
domain. The contents
of this container can be operated upon by the INTEGER
operations. In a program, we can declare several INTEGER
variables: each variable may have a different value, yet all of them
have the same set of operations.
Conceptually, an abstract data type is a set of entities whose logical behavior is defined by a domain of values and a set of operations on that domain. In the terminology we used above, an ADT is set of abstract data structures all of whom have the same domain of possible states and have the same set of operations.
We will refer to a particular abstract data structure from an ADT as an instance of the ADT.
The implementation of an ADT in a language like Pascal is similar to that discussed above for abstract data structures. In addition to providing operations to access and manipulate the data, we need to provide operations to create and destroy instances of the ADT. All operations (except create) must have as a parameter an identifier (e.g., a pointer) for the particular instance to be operated upon.
While Pascal does not directly support ADTs, the
The behavior of an ADT is defined by a set of operations that can be
applied to an instance of the ADT.
Each operation of an ADT can have inputs (i.e., parameters) and
outputs (i.e., results).
The collection of information about the names of the operations and
their inputs and outputs is the interface of the ADT.
To specify an ADT interface, we need to give:
Here we give an example specification of an ADT, a (bounded) stack.
We give the various components of the specification in detail in this
example. In later specifications we will leave some of the details
implicit when that does not lead to confusion.
To specify the signatures for the operations, we use the notation for
mathematical functions. By a tuple like
We categorize the operations into one of four groups depending upon
their functionality:
Constructors:
Mutators:
Accessors:
Destructors:
Note that the operation
Also note that operation
To the give the semantics (meaning) of each operation, we associate a
precondition and a postcondition with each operation. These are
logical assertions about the values of the input parameters and output
values of the operations.
Although the presentation here is informal, we try to be precise in
the statement of the pre- and postconditions.
Note that each operation except the constructor (
Also note that all of these
In some sense, this "validity" property is invariant for an
instance of the ADT; the constructor makes the property true, all
mutators and accessors preserve its truth, and the destructor makes it
false. (An invariant property must hold between operations on the
instance; it might not hold during the execution of an operation.)
Aside: An invariant on an ADT instance is similar in concept to an
invariant for a while-loop. A loop invariant holds before and after
each execution of the loop.
As a convenience in specification we will sometimes state the
invariants of the ADT separately from the pre- and postconditions of
the methods. We often will divide the invariants into two groups.
The interface invariants are part of the public interface of the ADT.
The implementation invariants are part of the hidden state of an
instance; in some cases, they define the meaning of the abstract
properties stated in the interface invariants.
A Java
Like the Pascal record type, a Java class can consist of several
components. In Pascal, all the components are data fields. However,
in Java, functions and procedures may be included as components of a
class. These procedures and functions are called methodsclass
construct provides a direct way to define ADTs in languages like C++
and Java.
.
Defining ADTs
An Example ADT: Stack
Name
Stack
(of Item
)
Sets
Stack
:
(This is the set we are defining with the ADT.)
Item
:
boolean
:
int
:
Signatures
(Stack, Item)
,
we mean the Cartesian product of sets Stack
and
Item
, that is, the set of ordered pairs where the first
component is from Stack
and the second is from
Item
. The set to the right of the ->
is the return type of the function.
We will normally list the operations in that order.
create: int -> Stack
push: (Stack, Item) -> Stack
pop: Stack -> Stack
top: Stack -> Item
empty: Stack -> boolean
full: Stack -> boolean
destroy: Stack ->
pop
does not return the top item
on the stack as it does in some formulations. We have separated out
the "return top" functionality into operation top
. The
operation pop
merely discards the top item on the stack.
destroy
does not return a
value.
Semantics
create(int size) -> Stack S'
size >= 0
S'
is a valid new instance of
Stack
that has the capacity to store size
items but is empty initially.
push(Stack S, Item I) -> Stack S'
S
is a valid Stack
instance that is not
full.
S'
is a valid Stack
instance.
S' = S
with I
added as the new top.
pop(Stack S) -> Stack S'
S
is a valid Stack
instance that is
not empty.
S'
is a valid Stack
instance.
S' = S
with the top item deleted.
top(Stack S) -> Item I
S
is a valid Stack
instance that is
not empty.
I =
the top item on S
.
(S
is not modified by this operation.)
empty(Stack S) -> boolean e
S
is a valid Stack
instance.
e
is true
if and only if S
contains no elements (i.e., is empty).
(S
is not modified by this operation.)
full(Stack S) -> boolean f
S
is a valid Stack
instance.
f
is true
if and only if S
contains no space for additional items (i.e., is full).
(S
is not modified by this operation.)
destroy(Stack S) ->
S
is a valid Stack
instance.
Stack S
no longer exists.
create
)
has a Stack
instance as an input; the constructor and
each of the mutators also has a Stack
instance as an
output. This parameter identifies the particular instance that the
operation is manipulating.
Stack
instances are required
to be "valid" in all preconditions and postconditions, except the
precondition of the constructor and the postcondition of the
destructor. By valid we mean that the state of the instance is within
the acceptable domain of values; it has not become corrupted or
inconsistent. What is specifically mean by "valid" will differ from
one implementation of a stack to another.
Java Classes
class
is similar to a user-defined
record type in Pascal. It is a template for
constructing data items that have the same structure but differing
values (states). We say that an item constructed by a class is a
class instance (or, as we see later, an object).
The methods declared in a class may be either instance methods or class methods. We can consider an instance method as being associated with an instance of the class and a class method as being associated with the class as a whole, not with any specific instance.
We declare a method as a class method by giving the keyword
static
in its header; a method defined without the
static
keyword is an instance method. For example, a
main
method of a program is a class method of the class
in which it is defined.
public static void main(String[] args) { // beginning code for the program }
Similarly, the variables (data fields) declared in a class may be
either instance variables or class variables. As in the case of
methods, an instance variable is associated with an instance
of the class; each instance has its own copy of the variable.
A class variable is associated with the class as a whole;
there is only one copy of the variable for the entire class.
As with methods, the keyword static
denotes class variables.
An instance method has access to the instance variables of the class instance (object) to which it is applied. The instance's variables are implicit arguments of the method calls. The instance methods also have access to the class variables (if any).
Class methods only have access to the class variables. The methods do not have any implicit arguments. In fact, class methods can be called without any instances of the class being in existence.
The components of a class can be designated as public
or
private
. The public components of the class are
accessible from anywhere in the program. The private components are
only accessible from inside the class.
As a general rule, the data fields of a class should be private instance variables, meaning that they are associated with a specific instance and are only accessible by the instance's methods. This hides, or encapsulates, the data fields within class instance.
In addition to the primitive types, we can declare a Java variable to have a class as its type. These variables are actually containers for references to (i.e., addresses of) the instances of the class. The class instances themselves are stored in the dynamically managed heap memory area; Java allocates memory from the heap to hold newly constructed instances of a class. In general, avoid public variables.
To implement an ADT in Java, we can do the following:
class
construct to represent the entire
ADT. If we want to allow access to the Stack
class from
anywhere in the program, we will make the class public
.
For the Stack
ADT, we can use the following structure for
the class:
public class Stack { // implementation of instance methods and data here }
For example, to declare a variable that can hold a reference to a
Stack
instance, we can use the following declaration:
Stack stk;
A Java constructor is a method with the same name as the class. Upon
creation of an instance of the class, the constructor initializes the
instance's state. A constructor is normally invoked by the Java
operator new
. The operator new
allocates
memory on the heap for the instance, calls the constructor to
initialize it, and then returns a reference to the new instance.
For example, we can represent the ADT operation
create
by the constructor method Stack
.
public class Stack { public Stack(int size) { // initialization code } // rest of Stack methods and data ... }
A user of the Stack
can then declare a variable and
initialize it to hold a reference to a new stack with a capacity of
100 items as follows:
Stack stk = new Stack(100);
The expression new Stack(100)
allocates a
Stack
instance in the heap storage and calls the
constructor above to initialize the data fields encapsulated within
the instance.
We can apply a method to a class instance by using the selector (i.e.,
"dot") notation. This notation is similar to the notation for
accessing record
components in Pascal.
For example, in the case of the Stack
ADT we can
represent the operations as instance methods of class
Stack
. The explicit Stack
parameters and
return values of the operations thus become implicit.
Suppose we want to push an item x
onto the stk
created above. We can do that with the following code:
if (!stk.full()) stk.push(x);
We can then examine the top item and remove it:
if (!stk.empty()) { it = stk.top(); stk.pop(); }
public
methods of the class. That is, precede the
method's definition by the keyword public
.
void
) methods. These modify the encapsulated state of
the class instance (which is the implicit argument of the methods).
public void pop() { // code to implement operation }
public boolean empty() { // code to implement operation }
public void destroy() { // code to free resources }
private
data fields of the Java class to
represent the encapsulated state of the instance needed for a
particular implementation. By making the data fields
private
they are still available to the instance's
methods, but are not visible outside the class.
public class Stack { // public operations of class instance // encapsulated data fields of class instance private int topItem; // Pointer to next index for insertion private int capacity; // Maximum number of items in stack private Object[] stk; // the stack }
public
data fields in the class. They
violate the principle of information hiding. Instead introduce
appropriate accessor and mutator methods to allow manipulation of the
hidden state.
private
methods to aid in
implementation. Functionality common to several methods can be placed
in separate functions and procedures as appropriate. However, since
these are private
, they can only be accessed from within
the class and thus can be changed without effecting the public
interface of the class.
Java does not currently have a parameterized class facility (like the
C++ template
or Ada generic
mechanisms). We
must handle the type parameters of the ADT in other ways.
For example, in the next section we represent the set
Item
of the Stack
ADT by the class
Object
. As we will see when we discuss inheritance, the
Object
type will allow us to store an instance of any
class on the Stack
. With this definition, any data of a
reference type can appear in the stack, but values of the primitive
types cannot.
In this section, we give an implementation of the Stack
that uses an array of objects and an integer "pointer" to represent
the stack. This implementation is not robust; each operation assumes
that its precondition holds. A more robust implementation might check
whether the precondition holds and throw an exception if it does not.
Remember that the invariants are implicitly pre- and postconditions of all mutator and accessor methods, postconditions of the constructor, and preconditions of the destructor.
// A Stack ADT public class Stack { // Interface Invariant: Once created and until destroyed, this // instance has a valid and consistent internal state public Stack(int size) // Pre: size >= 0 // Post: initialized empty Stack instance with capacity size { stk = new Object[size]; capacity = size; topItem = 0; } public void push(Object I) // Pre: this instance is not full // Post: I added as the new top of this instance's stack { stk[topItem] = I; topItem++; } public void pop() // Pre: this instance is not empty // Post: item at top of stack removed from this instance { topItem--; stk[topItem] = null; } public Object top() // Pre: this instance is not empty // Post: item at top of this instance's stack returned { return stk[topItem-1]; } public boolean empty() // Pre: true // Post: return true iff this instance's stack has no elements { return (topItem <= 0); } public boolean full() // Pre: true // Post: return true iff this instance's stack is at full capacity { return (topItem >= capacity); } public void destroy() // Pre: true // Post: internal resources released; stack effectively deleted { stk = null; capacity = 0; topItem = 0; } // Implementation Invariants: 0 <= topItem <= capacity // stack is in array section stk[0..topItem-1] // with the top at stk[topItem-1], etc. private int topItem; // Pointer to next index for insertion private int capacity; // Maximum number of items in stack private Object[] stk; // the stack }
Consider an ADT for storing and manipulating calendar dates. We will
call the ADT Day
to avoid confusion with the
Date
class in the Java API. This ADT is based on the
Day
class defined in Chapter 4 of the first edition of
the book Core Java.
Logically, a calendar date consists of three pieces of information: a
year designator, a month designator, and a
day of the month designator. A secondary piece of
information is the day of the week. In this ADT interface definition,
we use integers (i.e., Java int
) to designate these
pieces of information.
create(int y, int m, int d) -> Day D'
y != 0 && 1 <= m <= 12 && 1 <= d <= #days in month m
.(y,m,d)
does not fall in the gap formed by the
change to the modern (Gregorian) calendar.
D'
is a valid new instance of
Day
with year y
, month m
, and
day d
.
setDay(Day D, int y, int m, int d) -> Day D'
D
is a valid instance of Day
.y != 0 && 1 <= m <= 12 && 1 <= d <= #days in month
.(y,m,d)
does not fall in the gap formed by the
change to the modern (Gregorian) calendar.
D'
is a valid instance of Day
.D'= D
except with
year y
, month m
, and day d
.
Question: Should we include setDay
,
setMonth
, and setYear
operations?
What problems might arise?
advance(Day D, int n) -> Day D'
D
is a valid instance of Day
.
D'
is a valid instance of Day
.D' = D
with the date moved
n
days later. (Negative n
moves to an
earlier date.)
getDay(Day D) -> int d
D
is a valid instance of Day
.
d
is day of the month from D
, where
1 <= d <= #days in month getMonth(D)
.D
is unchanged.)
getMonth(Day D) -> int m
D
is a valid instance of Day
.
m
is the month from D
, where
1 <= m <= 12
.D
is unchanged.)
getYear(Day D) -> int y
D
is a valid instance of Day
.
y
is the year from D
, where
y != 0
.D
is unchanged.)
getWeekday(Day D) -> int wd
D
is a valid instance of Day
.
wd
is the day of the week upon which
D
falls: 0 = Sunday, 1 = Monday, ..., 6 = Saturday.D
is unchanged.)
equals(Day D, Day D1) -> boolean eq
D
and D'
are valid instances of
Day
.
eq
is true if and only if D
and
D'
denote the same calendar date.D
and D'
are unchanged.)
daysBetween(Day D, Day D1) -> int d
D
and D'
are valid instances of
Day
.
d
is the number of calendar days from
D1
to D
, i.e.,
equals(D,advance(D1,d))
would be true.D
is unchanged.)
toString(Day D) -> String s
D
is a valid instance of Day
.
s
is the date D
expressed in the format
"Day[getYear(D)
,getMonth(D)
,getDay(D)
]".D
is unchanged.)
destroy(Day D) ->
D
is a valid instance of Day
.
D
no longer exists.
// Parts of this come unmodified from _Core_Java_ (First Edition). // Not all the commenting has been made consistent. import java.util.*; public class Day { // Interface Invariant: Once created and until destroyed, this // instance contains a valid date. getYear() != 0 && // 1 <= getMonth() <= 12 && 1 <= getDay() <= #days in getMonth(). // Also calendar date getMonth()/getDay()/getYear() does not // fall in the gap formed by the change to the modern // (Gregorian) calendar. // Constructors public Day() // Pre: true // Post: the new instance's day, month, and year set to today's // date (i.e., the date of creation of the instance) { java.util.Date today = new java.util.Date(); year = today.getYear() + 1900; month = today.getMonth() + 1; day = today.getDay(); } public Day(int y, int m, int d) throws IllegalArgumentException // Pre: y != 0 && 1 <= m <= 12 && 1 <= d <= #days in month m // (y,m,d) does not fall in the gap formed by the change // to the modern (Gregorian) calendar. // Post: the new instance's day, month, and year set to y, m, // and d, respectively // Exception: IllegalArgumentException if y m d not a valid date { year = y; month = m; day = d; if (!isValid()) throw new IllegalArgumentException(); } // Mutators public void setDay(int y, int m, int d) throws IllegalArgumentException // Pre: y != 0 && 1 <= m <= 12 && 1 <= d <= #days in month m // (y,m,d) does not fall in the gap formed by the // change to the modern (Gregorian) calendar. // Post: this instance's day, month, and year set to year, month // and day set to y, m, and d, respectively // Exception: IllegalArgumentException if y m d not a valid date { year = y; month = m; day = d; if (!isValid()) throw new IllegalArgumentException(); } public void advance(int n) // Pre: true // Post: this instance's date moved n days later. (Negative n // moves to an earlier date.) { fromJulian(toJulian() + n); } // Accessors public int getDay() // Pre: true // Post: returns the day from this instance, where // 1 <= getDay() <= #days in this instance's month { return day; } public int getMonth() // Pre: true // Post: returns the month from this instance's date, where // 1 <= getMonth() <= 12 { return month; } public int getYear() // Pre: true // Post: returns the year from this instance's date, where // getYear() != 0 { return year; } public int getWeekday() // Pre: true // Post: returns the day of the week upon which this instance // falls, where 0 <= getWeekday() <= 6; // 0 == Sunday, 1 == Monday, ..., 6 == Saturday { // calculate day of week return (toJulian() + 1)% 7; } public boolean equals(Day dd) // Pre: dd is a valid instance of Day // Post: returns true if and only if this instance and instance // dd denote the same calendar date { return (year == dd.getYear() && month == dd.getMonth() && day == dd.getDay()); } public int daysBetween(Day dd) // Pre: dd is a valid instance of Day // Post: returns the number of calendar days from the dd // instance's date to this instance's date, where // equals(dd.advance(n)) would hold { // implementation code return toJulian() - dd.toJulian(); } public String toString() // Pre: true // Post: returns this instance's date expressed in the format // "Day[year,month,day]" { // implementation code return "Day[" + year + "," + month + "," + day + "]"; } // Destructors -- None needed // Private Methods -- Mostly borrowed from _Core_Java_ /** * Computes the number of days between two dates * @return true iff this is a valid date */ private boolean isValid() { Day t = new Day(); t.fromJulian(this.toJulian()); return t.day == day && t.month == month && t.year == year; } private int toJulian() /** * @return The Julian day number that begins at noon of * this day * Positive year signifies A.D., negative year B.C. * Remember that the year after 1 B.C. was 1 A.D. * * A convenient reference point is that May 23, 1968 noon * is Julian day 2440000. * * Julian day 0 is a Monday. * * This algorithm is from Press et al., Numerical Recipes * in C, 2nd ed., Cambridge University Press 1992 */ { int jy = year; if (year < 0) jy++; int jm = month; if (month > 2) jm++; else { jy--; jm += 13; } int jul = (int) (java.lang.Math.floor(365.25 * jy) + java.lang.Math.floor(30.6001*jm) + day + 1720995.0); int IGREG = 15 + 31*(10+12*1582); // Gregorian Calendar adopted Oct. 15, 1582 if (day + 31 * (month + 12 * year) >= IGREG) // change over to Gregorian calendar { int ja = (int)(0.01 * jy); jul += 2 - ja + (int)(0.25 * ja); } return jul; } private void fromJulian(int j) /** * Converts a Julian day to a calendar date * @param j the Julian date * This algorithm is from Press et al., Numerical Recipes * in C, 2nd ed., Cambridge University Press 1992 */ { int ja = j; int JGREG = 2299161; /* the Julian date of the adoption of the Gregorian calendar */ if (j >= JGREG) /* cross-over to Gregorian Calendar produces this correction */ { int jalpha = (int)(((float)(j - 1867216) - 0.25) / 36524.25); ja += 1 + jalpha - (int)(0.25 * jalpha); } int jb = ja + 1524; int jc = (int)(6680.0 + ((float)(jb-2439870) - 122.1)/365.25); int jd = (int)(365 * jc + (0.25 * jc)); int je = (int)((jb - jd)/30.6001); day = jb - jd - (int)(30.6001 * je); month = je - 1; if (month > 12) month -= 12; year = jc - 4715; if (month > 2) year--; if (year <= 0) year--; } // Implementation Invariants: // year != 0 && 1 <= month <= 12 && 1 <= day <= #days in month // (year,month,day) not in gap formed by the change to the // modern (Gregorian) calendar private int year; private int month; private int day; }
The design and implementation of ADTs (i.e., classes) must be approached from two points of view simultaneously:
Client-supplier relationship:
________________ ________________ | | | | | Client |===USES===>| Supplier | |________________| |________________| (ADT user) (ADT)
The supplier's concerns include:
The clients' concerns include:
As we have noted previously, the interface of an ADT is the set of features (i.e., public operations) provided by a supplier to clients.
A precise description of a supplier's interface forms a contract between clients and supplier.
The client-supplier contract:
The contract
If we are both the clients and suppliers in a design situation, we should consciously attempt to separate the two different areas of concern, switching back and forth between our supplier and client "hats".
We can use the following design criteria for evaluating ADT (class) interfaces. Of course, some of these criteria conflict with one another; a designer must carefully balance the criteria to achieve a good interface design.
Some of the material here is based on the presentation in the following books:
The first version of these lecture notes were written for use in the first Java-based version of CSCI 211 (File Systems) during the fall semester of 1996.
UP to CSCI 581 Lecture Notes root document?