The real time savings in automatic error detection
This paper explains where time is really used in software
development and debugging, and where time can be saved.
Table of Contents
Automatic error detection tools are a blessing for programmers. These
tools enable programmers to find difficult errors such as memory
corruption, memory leaks, uninitialized memory usage and many other
types of errors. With these tools, bugs can be found in a matter of
minutes or hours. Without them, it might take weeks or even months to
track down some bugs.
If these tools are so useful, why don't programmers use them all the
time? One would think that programmers would use these tools on every
piece of code they write, from the time the code is constructed to
when it is ready to run. Unfortunately, they do not do
this. Perception, speed, and inertia are some of the reasons why
programmers do not use these tools as often as they should.
The perception comes from the fact that programmers are usually very
optimistic. They do not believe that they have errors in the code they
write. If by chance there is an error, programmers assume that they
can find the error quickly and easily. It is not often as easy as
that. In most cases, programmers don't realize that they will have to
work many extra hours to locate these errors.
Inertia is the programmer's unwillingness to change the current
development procedure. Using error detection tools often requires
modifications to the compilation and linking phase. On UNIX machines,
the programmer may have to modify some makefiles. On Windows 95/98/NT or
2000 platforms, the programmer has to modify settings in the IDE
to build a debugging version.
In many cases, having to make modifications is enough to stop
programmers from using error detection tools. A programmer with an
error may think, "If I use this tool I will have to modify my
build procedure and wait until I have a debugging version ready... I
won't use the tool, I'll just look for the bug now. I'm sure I can
find the bug in no time." In the meantime, the bug turns out to
be harder to find than it seemed at first. After half a day of looking
for the bug, the programmer begins to be more willing to look for an
alternative method.
There are statistics that indicate this to be common
practice. Microsoft performed an internal study and discovered that it
takes an average of 12 hours to find and fix a bug. (See "Writing Solid
Code" by Steven Maguire)
This could be avoided if programmers were more willing to use tools.
It is true that error detection tools require a start-up time before being
used. However, start-up time can be minimized by structuring the
development environment properly.
Before we go on, let's first look at the amount of start-up time
required to use automatic error detection tools like Insure++. We
performed our measurements on two platforms: a UNIX workstation and
Windows NT system. The start-up time is perceived to be primarily from
the preparation of testing programs which includes compilation and
linking of the application.
We compiled a product with 50,000 lines of C code on both platforms.
Insure++
from ParaSoft
was used for UNIX and BoundsChecker Professional from
NuMega Technologies Inc.
was used for Windows NT. We chose these tools because the
BoundsChecker Professional product includes technology from ParaSoft
in it. The time measurements are as follows:
Platform
|
Normal
|
Runtime Error Detection
|
UNIX
|
4min 28sec
|
15min 20sec
|
Windows NT
|
8min 28sec
|
27min 45sec
|
Table 1: Time to completely build application (50,000 lines)
|
As indicated above, it takes about a factor of 3 in compilation and
linking time to generate the executable for testing. This is one
reason why programmers hesitate using these tools. They think that
because they don't have a debugging version ready, they will need to
recompile the entire code. Programmers know that compilation takes
time and generating code for testing will take about 3 times
longer. At this point, they think to themselves, "I may as well
try to find bug myself."
When developing code, programmers almost always compile and link their
code with debugging switches turned on. They do this because they
expect to use the debugger. When they realize they have a bug, they
don't want to spend time waiting to build a debugging version.
Good programmers have found a solution to this problem. They keep two
versions of the libraries and objects needed to build the code they
are working on. The first version does not contain debug information,
while the second one does. Normally, programmers work on the version
without debugging. If they find a problem they simply switch to the
debugging version.
When they switch, they keep the debugging library around. This way,
when they build an updated version, they only need to recompile the
files modified since the last build of the debugged version.
This shortens the time needed to build the debugging version. Since it
usually involves one or two source files and relinking, it speeds up
the process. The same technique can be used for automatic error
detection. We suggest keeping around libraries used for automatic
debugging. This reduces compilation time and makes working with
automatic debuggers faster and more efficient.
The following experiment was performed to support the suggestion made
above. Using the same 50,000 lines of C code, we modified two of the
files. Then we measured the time it takes to rebuild the application
on UNIX and NT platforms.
Platform
|
Normal
|
Runtime Error Detection
|
UNIX
|
0min 18sec
|
1min 2sec
|
Windows NT
|
0min 17
|
0min 48sec
|
Table 2: Time to rebuild application (2 files)
|
As we can see, the rebuild time was dramatically shorter than the
complete build. Once past the initial build, there is practically no
difference in working with or without automatic debuggers. In fact the
amount of time spent rebuilding after changing a small number of files
is negligible when compared to the total amount of time used to build
an application. This makes the benefits of automatic debugging far
outweigh the costs, when the tools are used properly.
Once you are employing the model where only incremental compilation is
required, it's easy to see that most of the time is spent relinking
the application - the cost of recompiling 1 additional object module
versus linking an entire executable is negligible. The following
table shows the amount of time spent in compilation and linking on a
rebuild. The first number in the chart indicates the compilation
time, the second is the linking time. As you can see, the link phase
is about three times as long as the compile phase. For complicated
programs with many object files, the difference becomes even more
pronounced.
Platform
|
Normal
|
Runtime Error Detection
|
|
compile
|
link
|
compile
|
link
|
UNIX
|
5sec
|
15 sec
|
15 sec
|
47 sec
|
Windows NT
|
4 sec
|
13 sec
|
6 sec
|
42 sec
|
Table 3: Rebuild time: compile vs. link
|
Thus, we believe that claims made by vendors who build automatic
detection tools that only relink applications are without basis -
under typical usage, the majority of time spent waiting is during the
link phase, which is required by either approach. Removing the
re-compilation does not significantly reduce the delay due to
instrumentation.
Programmers who are serious about the quality of their code should be
using this method. Most development environments support the ability
to build and maintain multiple libraries and objects for the same
application. Using Visual C++ with Windows, this is done by creating
a new build target similar to the "Debug" and "Release" ones which
AppWizard creates for you - when you want to instrument, choose the
"Instrumentation" build and go.
On Unix, you can achieve the same results with a simple modification
to your makefiles. Here's a very tiny example makefile which
illustrates the key points.
Before Modification
LIB=libtest.a
all: myapp
$(LIB): libfile.c
cc -c libfile.c
ar rv $(LIB) libfile.o
myapp: $(LIB) myapp.c cc -o myapp myapp.c $(LIB)
After Modification
EXT=release
LIB=libtest_$(EXT).a
[rest of makefile is unchanged]
To build normally, type "make ". To build a typical debug
version, you would normally type "make EXT=debug ",
now to build a runtime error detection version, you type
"make EXT=detect " This useful technique
requires only a very low investment which pays tremendous
dividends.
Now, if speed is not the issue, what is the basic differentiation
between automatic debuggers that recompile and ones that relink? The
difference is in the ability to find bugs. It is well established that
tools that instrument the source code are more accurate than those
that instrument the object code. This is because the source code
contains all necessary information about the program. This information
can be used for debugging. At the object level, most of the
information is gone. There is not a trick in the world that can bring
it back!
When making a decision on which tool to use, we suggest looking at the
tool's ability to detect bugs. The tool that accurately detects more
bugs is always best. Also, remember Microsoft's study that it takes 12
hours to find and fix a bug. The tool that finds this bug will save
you a lot of time. How many errors the tool can locate is more
important than how fast the tool operates. The tool is really
competing against 12 hours of human time and wins hand down. In
addition, a good tool will find bugs that you may not even know
exists, saving you not only money, but time as well.
ParaSoft, Insure++, TCA, Inuse, SCI, RPT, and
CodeWizard are trademarks or service marks of ParaSoft
Corporation.
NuMega Technologies, Inc., and BoundsChecker Professional are trademarks of
NuMega Technologies, Inc.
All other product names mentioned within these pages are the trademarks
of their respective owners.
|