C++ Defensive Programming:
Firewalls and Debugging Information
Michael Aivazis, ParaSoft
Table of Contents
In this article, we present two proactive programming techniques that
assist in producing more robust code with minimal overhead in both programmer
effort and runtime performance. These techniques, "firewalls" and "debugging
information", are natural extensions of proven programming practices.
A firewall is a point in the logical flow of a program where the validity
of logical constraints is checked. If the constraint is satisfied, execution
proceeds. If the constraint is violated, the firewall triggers and generates
an appropriate error message. The triggering of a firewall is a severe error
that indicates that the program is logically inconsistent. There are two
different ways one can recover from a firewall and both require modifications
in the code. The first is to locate the part of the code that generated the
constraint violation and rewrite it. The other possibility is that the
constraint enforced by the firewall was improperly expressed, unduly
restrictive or incorrect. The correct recovery in this situation is to
rewrite the firewall so that it reflects the correct constraint more
accurately. In either case, the modifications required lead to code that is
cleaner and more maintainable.
The debugging information system offers the advantages of inserting
diagnostic output in appropriate places in the code and adds manifold
flexibility. Traditionally, while a piece of code is first written, the
programmer spends considerable time writing diagnostic code in order to
monitor the program execution. Typically, these pieces of code are removed
after the initial debugging session. The system presented here allows for
the grouping of related pieces of information by name and places them under
the control of an environment variable. In this manner, only the information
relevant to the current debugging session is displayed and the effort that
is expended in generating this information is not wasted. If desired, such
statements can be automatically disabled for the production system, using
specially constructed classes for C++ programs and the preprocessor for C
programs.
As stated earlier, a firewall is a point in the logical flow of a program
where the validity of the assumptions made by the code that follows is
asserted. Verifying that these assumptions are indeed correct is distinct
from normal error checking. Invalid user input or a bad return value from
some system call are reasons to generate error message, throw exceptions or
employ whatever error reporting mechanism the designer of the system has
chosen, but are not reasons to trigger a firewall.
The firing of a firewall is an alarm for the developer and indicates that
the internal state of the program is inconsistent. It could happen, for
example, if a function that expects its integer argument to be strictly
positive receives either zero or a negative number. Another example is a
function that expects a pointer argument that is implicitly assumed to be
non-null. Such a function is very likely to dereference this argument without
first checking whether it is a null pointer.
There are two ways to recover from the firing of a firewall: either the
constraint expressed in the firewall is bad, i.e. violating it is
correct behavior for the program, or the code before the firewall must
be modified so that the constraint is not violated.
In some sense, a firewall is live documentation that must be
maintained and can never be out of sync with the correct state of the program,
so that it always reflects accurately the expectations of the designer. At
first, most firewalls appear trivial and superfluous. However, experience with
large projects shows that as software systems evolve and age, the implicit
assumptions they make on their environment become more and more likely to be
violated. In many cases, even the original designer of a piece of code will
have a hard time reconstructing what constitutes proper use of a piece of
code.
The ideal system for the validation of the internal consistency of a
software system would generate firewalls automatically based on the
surrounding code. Unfortunately, such a system is still outside the realm of
the currently feasible and the burden of implementing and maintaining the
consistency checks rests mainly with the programmer.
A practical alternative is to provide a method by which expressing the
system invariants is simple, easy to learn and has very high return value in
exchange for a negligible overhead, both in terms of system performance and
developer productivity.
Firewalls should not alter the execution flow of a program. This
requirement is important in order to ensure that the code being tested is in
all respects identical to the shipping version. Firewall should have a
negligible impact on the performance of the shipping version. For the
in-house version, the overhead should still be small enough so that
performance is not seriously compromised. The system should be simple to
learn and maintain so that its use is effortless. The constraint validation
should consist of simple checks that generally avoid allocating large memory
blocks or calling expensive routines. Such testing is almost certainly useful,
but it is beyond the scope of firewalls.
One of the simplest cases of firewalls is validating that the parameters
passed to a function satisfy trivial constraints. For example, if a function
assumes that a pointer argument is non-null and the required check is deemed
an unacceptable performance penalty for the production system, the check can
be implemented as a firewall.
Suppose that we are implementing a doubly linked circular list in terms of
a ListNode class. ListNode s have two pointers to
other ListNode s, the _next and _prev
fields, and a few convenience functions.
The ListNode class might look something like the following:
class ListNode
{
public:
ListNode() : _prev(this), _next(this) {}
void join(ListNode *);
// Other methods follow ...
// Implementation
private:
ListNode * _prev;
ListNode * _next;
};
The method join takes a pointer to another ListNode and
arranges the ListNode pointers so that the new node is inserted
after it:
// join: insert newNode after me
// Assumes that newNode is not null
void ListNode::join(ListNode * newNode)
{
_newNode->_prev = this;
_newNode->_next = _next;
_next->_prev = newNode;
_next = newNode;
return;
}
This piece of code contains three pointer dereferences and doesn't test
any of them. The rationale is simple. The parameter newNode is
not allowed to be null, by design; there is a comment right above the
implementation that states this requirement very clearly. This implies that
the first two lines in the method join are safe. Also, recall
that this node is supposed to be part of a circular list, so _next
and _prev always point to other ListNode s.
The fundamental flaw in this argument is that these constraints are
largely in the mind of the implementers of this class, or at best, is partly
reflected in the overall design of the software system. Unfortunately, the
comments surrounding a piece of code are tossed out by the preprocessor and
do not turn into executable statements. In practice, it is easy to imagine
how either or both of these assumptions could be violated.
The above code could be made significantly more robust by using the
assert macro, available with most modern C and C++ compilers.
The assert macro takes an expression as an argument. If the
expression evaluates to true , execution resumes with the
statement that follows. If the expression is false, an error message is sent
to stderr and the program is forced to terminate. On UNIX
systems one gets an IO trap that results in the
generation of a core file, which in turn allows the developer to pinpoint
the reason for the assertion failure. The runtime check can be removed by
merely recompiling the code with the symbol NODEBUG defined on
the compiler command line, eliminating the runtime overhead for the production
system. A more conscientious developer might use this facility and improve the
above code as follows:
// join: insert newNode after me
// Assumes that newNode is not null
void ListNode::join(ListNode * newNode)
{
// Enforce a non-null newNode
assert(newNode != 0);
_newNode->_prev = this;
_newNode->_next = _next;
// Verify that my _next pointer is non-null
assert(_next != 0);
_next->_prev = newNode;
_next = newNode;
return;
}
The assert macro succeeds very well in satisfying both design goals. It
enforces the assumptions made by the designer and its impact on the
performance is small for the in-house version and non-existent for the
production system. It suffers from the following drawbacks:
- It treats all constraint violations uniformly: it halts the
program, making it less likely to be used as often as possible.
- It is cumbersome to use when the assertion is a lengthy
expression or an expression that must be evaluated in discrete
steps.
The class Firewall proposed here addresses both of these
drawbacks. This class provides a static method assert that is
a direct replacement of the assert macro. Using
Firewalls , we would rewrite the above code as follows:
// join: insert newNode after me
// Assumes that newNode is not null
void ListNode::join(ListNode * newNode)
{
// Enforce a non-null newNode
Firewall::assert(
newNode != 0,
__HERE__,
"node 0x%08lx: appending null node", this
);
// Verify that my _next pointer is non-null
Firewall::assert(
_next != 0,
__HERE__,
"node 0x%08lx: null _next pointer", this
);
// Append newNode
_newNode->_prev = this;
_newNode->_next = _next;
_next->_prev = newNode;
_next = newNode;
return;
}
The static method Firewall::assert takes a variable number of
arguments. The first is an expression that should evaluate to a bool
. The second, __HERE__ , is a preprocessor macro that
expands into as much information about the location of the firewall as is
available from the preprocessor. Generally, this means at least __FILE__ and
__LINE__, giving filename and line number. Under gcc , __FUNCTION__
is also used. The third argument is a printf -style format string
that is used to generate a message at runtime. The subsequent arguments if
any, obey the same rules as the arguments to printf . In this
example, we have printed the address of the offending node, to assist in
tracking down the problem with the debugger.
For constraint validation that is a little more involved, Firewall
offers the method active which evaluates to
true when firewalls are enabled. Also the method hit
provides an unconditional firewall. Examples of both are shown below, where
we re-write join to include a check that this node is indeed
part of a doubly linked list.
// join: insert newNode after me
// Assumes that newNode is not null
void ListNode::join(ListNode * newNode)
{
// Check assumptions
if (Firewall::active()) {
// Check _next pointer
if (!_next) {
Firewall::hit(
__HERE__, "node 0x%08lx: null _next pointer", this
);
}
// Check _prev pointer
if (!_prev) {
Firewall::hit(
__HERE__, "node 0x%08lx: null _prev pointer", this
);
}
// Validate forward links
if (!_next->_prev != this) {
Firewall::hit(
__HERE__, "node 0x%08lx: corrupt forward links", this
);
}
// Validate backward links
if (!_prev->_next != this) {
Firewall::hit(
__HERE__, "node 0x%08lx: corrupt backward links", this
);
}
// Check newNode
if (!newNode) {
Firewall::hit(
__HERE__, "node 0x%08lx: null _prev pointer", this
);
}
}
// Append newNode
_newNode->_prev = this;
_newNode->_next = _next;
_next->_prev = newNode;
_next = newNode;
return;
}
In addition to the methods presented here, Firewall provides
a lot of support for interactive use during a debugging session. For example,
each method that generates a firewall hit calls the method
Firewall::trap so that there is a convenient location for a break
point. Firewalls can be enabled or disabled from the debugger by calling the
method Firewall::active . Users whose debuggers cannot call
functions can always set the underlying data members directly.
The implementation of the class Firewall is very
straightforward. If the symbol FIREWALL is not defined either
in some header file or on the compiler command line, the class
Firewall collapses into trivial inline functions that should be
completely thrown away by the optimizing phase of the compiler.
class Firewall
{
public:
static bool active() { return false; }
static void hit(const char *, ...) { return; }
static void assert(bool cond, ...) { return; }
};
Most compilers will evaluate the active method at compile time, since it
is an inline that always return false and will not even generate
code for the bodies of if statements that call
Firewall::active() .
The same class looks as follows when the FIREWALL symbol is
defined for the preprocessor:
class Firewall
{
public:
static void trap();
static bool active() { return _active; }
static void hit(
const char *file, long line, const char *fmt = "", ...
);
static void assert(
bool cond,
const char *file, long line, const char *fmt = "", ...
);
private:
static void _init(); // bootstrap the firewall system
static bool _active; // true if firewalls are enabled
static bool _fatal; // true if firewalls should halt the program
};
This facility can be extended in a variety of ways. For example, an
obvious extension would allow firewalls to be controlled by the setting of
an environment variable, e.g. FIREWALLS . The setting supported
would be "off", "on" and "fatal" the latter causing the program to terminate
as soon as any firewall hits.
The messages generated by the firewall system should be made consistent
with whatever error reporting mechanism is used by the rest of the
application. It is easy to extend the Firewall class to allow
redirection of the firewall messages to an arbitrary stream. Other approaches
are also equally easy. For example, the Firewall class
currently in use at ParaSoft knows how to generate firewall messages
that appear to our error reporting mechanisms as error messages with their
own set of suppression fields. The can be sent to stderr or
redirected to Insra, the GUI message collector for all ParaSoft
tools.
Another extension is generating stack traces when a firewall hits. At
ParaSoft we have two implementations of this: one which relies on
bookkeeping information maintained by the Firewall class itself, and another
which unwinds the stack from the current program counter location.
Consider a function that takes as an argument the name of an external
program and is responsible for spawning it as a separate process and setting
up a pipe so that the parent can control the child. For our purposes, we will
set up a one-way communication channel; a more realistic example would set up
a bi-directional pipe. The code might look something like this:
bool OS::spawn(string program, int & child_pid, int & channel)
{
int to_child[2];
if (pipe(to_child) < 0) {
return false;
}
child_pid = fork();
// Handle fork failure
if (child_pid < 0) {
return false;
}
// The parent process
if (child_pid > 0) {
close(to_child[0]); // Close the unusable end of the pipe
channel = to_child[1]; // Return the write side of the pipe
return true;
}
// The child process
close(to_child[1]); // Close the unusable end of the pipe
close(0);
dup(to_child[0]); // read-side -> stdin
OS::execvp(program); // Spawn the program
exit(-1); // If we get here, OS::execvp failed!
}
During the initial development of this code, it is very likely that the
developer inserted code that generated appropriate messages as execution
proceeded along the various stages of this code. These diagnostics might have
included informational items such as the process id of the parent and the
child and the actual file descriptor number assigned by the system to this
pipe. It is also very likely that the error handling branches generated
diagnostics as well, since knowing that the error handling code was not
executed gives good feedback about the correctness of the code.
However, the production system has no need for such verbose output.
Therefore, the developer deleted all the carefully formatted and well thought
out diagnostics leaving the code as shown above.
Unfortunately, this effort will almost certainly be duplicated during a
bug fix. The first time a developer experiences strange behavior while
attempting to spawn an external utility, she will have to write diagnostic
code very similar to the original ones.
This problem can be solved permanently by designing a facility which allows
the diagnostics to reside permanently in the code, forces the developers to
maintain them and can selectively turn them on whenever needed. Just like
firewalls, this system should collapse for the production system, although it
might be useful to let select diagnostics remain available under a command
line switch or the setting of some environment variable for debugging in the
field.
Such a facility is encapsulated in the class Debug shown below. At compile
time, the symbol DEBUG determines which version of the code will
be compiled. At runtime, individual categories of debugging information can
be enabled by setting the environment variable DEBUG_OPT to a
semicolon-delimited list of debug categories.
When the DEBUG symbol is not defined, the class Debug reduces
to the following:
class Debug
{
public:
Debug(const char *) {}
static bool active() { return false; }
static void out(const char *, ...) { return; }
};
whereas the fully functional version is given by
class Debug
{
public:
Debug(const char * category);
bool active() const { return _active; }
void out(const char *file, long line, const char * format, ...) const;
public:
static void trap();
private:
const char * _category; // Not a string, so no overhead until used
bool _active;
static bool _showAll;
};
Let's revisit our OS::spawn method and sprinkle some
diagnostics. We instantiate an object of class Debug with
debugging category set to "OS::spawn " and then we will use
this object to generate messages.
bool OS::spawn(string program, int & child_pid, int & channel)
{
int to_child[2];
Debug info("OS::spawn"); // Here is the Debug instance
if (pipe(to_child) < 0) {
info.out("OS::spawn: pipe() failed!");
return false;
}
info.out("OS::spawn: ready to fork()");
child_pid = fork();
// Handle fork failure
if (child_pid < 0) {
info.out("OS::spawn: fork() failed!");
return false;
}
// The parent process
if (child_pid > 0) {
info.out("OS::spawn: parent process: child_pid=%d", child_pid);
close(to_child[0]); // Close the unusable end of the pipe
channel = to_child[1]; // Return the write side of the pipe
return true;
}
// The child process
close(to_child[1]); // Close the unusable end of the pipe
close(0);
dup(to_child[0]); // read end -> stdin
info.out("OS::spawn: child process: calling OS::execvp");
OS::execvp(program); // Spawn the program
exit(-1); // If we get here, OS::execvp failed!
}
This version of the code is much more stable than the previous one. It is
much less likely that it will be modified during a debugging session since
most of what it does can be observed and validated without even recompiling
it. But it can be made even more useful. A typical problem encountered with
code that sets up communication channels with other processes is testing the
overall robustness of the code. For example, if the child process dies and
the parent attempts to write to the pipe, a SIGPIPE is generated
by most UNIX variants.
Using the Debug class we can simulate this failure. Consider
the following fragment
bool OS::spawn(string program, int & child_pid, int & channel)
{
.
.
.
// The child process
close(to_child[1]); // Close the unusable end of the pipe
Debug fail("OS:spawn-fail");
if (fail.active()) {
fail.out("OS::spawn: simulating execvp failure ...");
close(to_child[0]);
exit(-1);
}
close(0);
dup(to_child[0]); // Make the read end of the pipe our stdin
info.out("OS::spawn: child process: calling OS::execvp");
OS::execvp(program); // Spawn the program
exit(-1); // If we get here, OS::execvp failed!
}
This way the user of this code can test how well their code behaves when
this particular facility fails to perform as advertised.
The techniques presented here minimize the amount of wasted programming
effort, maximize the functional code created and increase the internal code
consistency. Programmers who learn to use these techniques will find
themselves less likely to misuse their own code.
|