The real time savings in automatic error detection

This paper explains where time is really used in software development and debugging, and where time can be saved.

Automatic error detection tools are a blessing for programmers.
Programmers resist change.
It takes a long time to find and fix bugs.
There is a better way.
The real time penalty is the linking.
The biggest benefits comes from the right tools and the right method.
So how do you define the best error detection tool?

The real time savings in automatic error detection.

Automatic error detection tools are a blessing for programmers. These tools enable programmers to find difficult errors such as memory corruption, memory leaks, uninitialized memory usage and many other types of errors. With these tools, bugs can be found in a matter of minutes or hours. Without them, it might take weeks or even months to track down some bugs.

If these tools are so useful, why don't programmers use them all the time? One would think that programmers would use these tools on every piece of code they write, from the time the code is constructed to when it is ready to run. Unfortunately, they do not do this. Perception, speed, and inertia are some of the reasons why programmers do not use these tools as often as they should.

The perception comes from the fact that programmers are usually very optimistic. They do not believe that they have errors in the code they write. If by chance there is an error, programmers assume that they can find the error quickly and easily. It is not often as easy as that. In most cases, programmers don't realize that they will have to work many extra hours to locate these errors.

Programmers resist change.

Inertia is the programmer's unwillingness to change the current development procedure. Using error detection tools often requires modifications to the compilation and linking phase. On UNIX machines, the programmer may have to modify some makefiles. On Windows 95/98/NT or 2000 platforms, the programmer has to modify settings in the IDE to build a debugging version.

In many cases, having to make modifications is enough to stop programmers from using error detection tools. A programmer with an error may think, "If I use this tool I will have to modify my build procedure and wait until I have a debugging version ready... I won't use the tool, I'll just look for the bug now. I'm sure I can find the bug in no time." In the meantime, the bug turns out to be harder to find than it seemed at first. After half a day of looking for the bug, the programmer begins to be more willing to look for an alternative method.

It takes a long time to find and fix bugs.

There are statistics that indicate this to be common practice. Microsoft performed an internal study and discovered that it takes an average of 12 hours to find and fix a bug. (See "Writing Solid Code" by Steven Maguire) This could be avoided if programmers were more willing to use tools. It is true that error detection tools require a start-up time before being used. However, start-up time can be minimized by structuring the development environment properly.

Before we go on, let's first look at the amount of start-up time required to use automatic error detection tools like Insure++. We performed our measurements on two platforms: a UNIX workstation and Windows NT system. The start-up time is perceived to be primarily from the preparation of testing programs which includes compilation and linking of the application.

We compiled a product with 50,000 lines of C code on both platforms. Insure++ from ParaSoft was used for UNIX and BoundsChecker Professional from NuMega Technologies Inc. was used for Windows NT. We chose these tools because the BoundsChecker Professional product includes technology from ParaSoft in it. The time measurements are as follows:

Platform	Normal	*Runtime Error Detection*
UNIX	4min 28sec	15min 20sec
Windows NT	8min 28sec	27min 45sec
Table 1: Time to completely build application (50,000 lines)

As indicated above, it takes about a factor of 3 in compilation and linking time to generate the executable for testing. This is one reason why programmers hesitate using these tools. They think that because they don't have a debugging version ready, they will need to recompile the entire code. Programmers know that compilation takes time and generating code for testing will take about 3 times longer. At this point, they think to themselves, "I may as well try to find bug myself."

When developing code, programmers almost always compile and link their code with debugging switches turned on. They do this because they expect to use the debugger. When they realize they have a bug, they don't want to spend time waiting to build a debugging version.

There is a better way.

Good programmers have found a solution to this problem. They keep two versions of the libraries and objects needed to build the code they are working on. The first version does not contain debug information, while the second one does. Normally, programmers work on the version without debugging. If they find a problem they simply switch to the debugging version.

When they switch, they keep the debugging library around. This way, when they build an updated version, they only need to recompile the files modified since the last build of the debugged version.

This shortens the time needed to build the debugging version. Since it usually involves one or two source files and relinking, it speeds up the process. The same technique can be used for automatic error detection. We suggest keeping around libraries used for automatic debugging. This reduces compilation time and makes working with automatic debuggers faster and more efficient.

The following experiment was performed to support the suggestion made above. Using the same 50,000 lines of C code, we modified two of the files. Then we measured the time it takes to rebuild the application on UNIX and NT platforms.

Platform	Normal	*Runtime Error Detection*
UNIX	0min 18sec	1min 2sec
Windows NT	0min 17	0min 48sec
Table 2: Time to rebuild application (2 files)

As we can see, the rebuild time was dramatically shorter than the complete build. Once past the initial build, there is practically no difference in working with or without automatic debuggers. In fact the amount of time spent rebuilding after changing a small number of files is negligible when compared to the total amount of time used to build an application. This makes the benefits of automatic debugging far outweigh the costs, when the tools are used properly.

The real time penalty is the linking.

Once you are employing the model where only incremental compilation is required, it's easy to see that most of the time is spent relinking the application - the cost of recompiling 1 additional object module versus linking an entire executable is negligible. The following table shows the amount of time spent in compilation and linking on a rebuild. The first number in the chart indicates the compilation time, the second is the linking time. As you can see, the link phase is about three times as long as the compile phase. For complicated programs with many object files, the difference becomes even more pronounced.

Platform	Normal		*Runtime Error Detection*
	compile	link	compile	link
UNIX	5sec	15 sec	15 sec	47 sec
Windows NT	4 sec	13 sec	6 sec	42 sec
Table 3: Rebuild time: compile vs. link

Thus, we believe that claims made by vendors who build automatic detection tools that only relink applications are without basis - under typical usage, the majority of time spent waiting is during the link phase, which is required by either approach. Removing the re-compilation does not significantly reduce the delay due to instrumentation.

The biggest benefits comes from the right tools and the right method.

Programmers who are serious about the quality of their code should be using this method. Most development environments support the ability to build and maintain multiple libraries and objects for the same application. Using Visual C++ with Windows, this is done by creating a new build target similar to the "Debug" and "Release" ones which AppWizard creates for you - when you want to instrument, choose the "Instrumentation" build and go.

On Unix, you can achieve the same results with a simple modification to your makefiles. Here's a very tiny example makefile which illustrates the key points.

Before Modification

LIB=libtest.a
all: myapp
$(LIB): libfile.c 
    cc -c libfile.c 
    ar rv $(LIB) libfile.o
  myapp: $(LIB) myapp.c     cc -o myapp myapp.c $(LIB)

After Modification

EXT=release 
LIB=libtest_$(EXT).a

[rest of makefile is unchanged]

To build normally, type "make". To build a typical debug version, you would normally type "make EXT=debug", now to build a runtime error detection version, you type "make EXT=detect" This useful technique requires only a very low investment which pays tremendous dividends.

So how do you define the best error detection tool?

Now, if speed is not the issue, what is the basic differentiation between automatic debuggers that recompile and ones that relink? The difference is in the ability to find bugs. It is well established that tools that instrument the source code are more accurate than those that instrument the object code. This is because the source code contains all necessary information about the program. This information can be used for debugging. At the object level, most of the information is gone. There is not a trick in the world that can bring it back!

When making a decision on which tool to use, we suggest looking at the tool's ability to detect bugs. The tool that accurately detects more bugs is always best. Also, remember Microsoft's study that it takes 12 hours to find and fix a bug. The tool that finds this bug will save you a lot of time. How many errors the tool can locate is more important than how fast the tool operates. The tool is really competing against 12 hours of human time and wins hand down. In addition, a good tool will find bugs that you may not even know exists, saving you not only money, but time as well.

ParaSoft, Insure++, TCA, Inuse, SCI, RPT, and CodeWizard are trademarks or service marks of ParaSoft Corporation. NuMega Technologies, Inc., and BoundsChecker Professional are trademarks of NuMega Technologies, Inc. All other product names mentioned within these pages are the trademarks of their respective owners.

ParaSoft

Insure++