ParaSoft

HOME
PRODUCTS
SUPPORT
ABOUT
WHAT'S NEW
EVENTS

Insure++

Quick facts

Add-on Modules:
-INUSE
-TCA

Comparisons

Technical Papers

Support & Manuals

FAQs

Recent Reviews

User Testimonials

Press Releases

Analysis of Runtime Debugging Technology
(Comparison of Insure++ and Purify)

Why is a runtime debugger important?

The universal goal of a runtime debugger is detecting errors in your code, large and small. A debugger even finds hidden errors that the most advanced programmer can overlook. Regardless of where you think a problem exists, the debugger will immediately inform you of memory leaks, improper memory usage, memory fragmentation, and other problems in your program. A runtime debugger also has the ability to locate potential problems that may not be caught during in-house testing. These reported errors not only save you time, but reduce support costs by helping you avoid future problems. A runtime debugger provides you with a more accurate, efficient procedure for developing and maintaining the highest quality software possible.

Tools and techniques for debugging.

There are several techniques which may be used for runtime error checking. The primary purpose of these techniques is to convert an existing program into a new program which is functionally equivalent to the original one. The new program also contains extra code which checks for errors during program execution. Each individual instruction of the original program is sandwiched between the error-checking instructions. The run-time debugger's function is to correctly identify the errors and verify the accuracy of the original code. This particular technique for program modification is called code insertion.

Code insertion can be implemented at different stages of the software development cycle. When insertion is done at the compilation stage, it is called source code instrumentation. Insertion done directly on the executable or at the linking stage is called object code insertion.

The inherent advantages of source code implementation.

Source code instrumentation involves parsing, analyzing, and converting the original source code into a new, equivalent source code. The equivalent code is stored in a temporary file that is passed to the compiler, which generates the object code. This is done in a manner that is completely transparent to the user. Throughout the process, the original source code file is not modified and the entire procedure does not require any user intervention. Once all of the files in the project are instrumented, they are linked into a final executable, which is then ready for run-time error detection.

During the detection phase, source code instrumentation is inherently more accurate in identifying many program implementation errors, as well as problems associated with memory corruption. This technique also makes all of the program's information available to the debugger. With this capability, source code instrumentation possesses a significant advantage over object code insertion.

The limitations of object code insertion.

Object code insertion involves the instrumentation of each object file and library. In this case, the instrumentation is only valid during the linking stage, where each object file and library are categorized as a portion of the executable being built. Each file is decoded, and additional object code is inserted to check the accuracy of the memory accesses. During this process, new equivalent modules are created in temporary files where new executables are built.

The final executable is linked once all of the object modules are instrumented. This process can also be made completely transparent to the user, and there are no changes to the original executable. Object code insertion at the executable level and object code insertion at the linking stage are identical with only one exception. When instrumenting an executable, the instrumentation is done on the executable itself, rather than on the independent modules.

It is important to know that object code insertion can ONLY be used to detect memory-related errors. This is because it relies on detecting instructions which operate on the memory. No other errors can be detected with it! This is a key factor in defining the technology of object code insertion.

Source Code Instrumentation:
The ultimate solution for debugging.

Accuracy is a significant difference between source code instrumentation and object code insertion. While the accuracy rate is very successful in source code instrumentation, accuracy during object code insertion can be questioned. Object code insertion techniques require continually guessing about memory references even with precise memory tracking. Since output is based on guesses, consistency and confidence in such tools is suspect. With this is mind, one needs to analyze which technology the tools use, and determine which functionality is most appropriate for the task.

Object code insertion, in some cases, may be more appealing to a developer because he thinks it is quicker to employ and use. However, in a typical development environment, recompiliation of the module is often necessary to generate object code. Therefore, the faster speed of object code insertion can be misleading, and prove to be insignificant in the debugging process. Also, due to object code insertion limitations, all errors cannot be detected.

These undetected errors left behind will have to be discovered one day. The only remedy is to search for the hidden errors in a line-by-line manual process. This technique is even slower than source code instrumentation! By comparison, source code instrumentation is slower, but at its consistent speed, every error can be detected, and time conservation is established in the long run. It is always more efficient to detect ALL of the errors, including those that are well hidden, rather than finding some of the errors quickly, and others not at all.

The remainder of this document compares two run-time error detection tools using different techniques - Insure++ and Purify. Insure++ is a product produced by ParaSoft Corporation which uses source code instrumentation. Purify is a product of Pure Software which uses object code insertion at the link stage. The following discussion describes the differences of both products and illustrates them with code examples.

Source Code Instrumentation vs. Object Code Insertion

In order to check your program, both Purify and Insure++ instrument the original program. The methods used, however, are quite different. Purify chooses the quick and easy method of instrumenting object code. Insure++, however, instruments the original source code in order to get the maximum checking. Unfortunately, much of the information gets lost when creating an object file for robust checking, you have to go to the source.

Specifically, you get the following:

Compile time checking of code which may never even get executed. Many errors which Purify will delay to runtime, Insight can detect before the program even runs. For example:
```
		void foo() {
		char a[10], b[10], c[10];
		b[10] = 1; /* Detected at compile time. */
		}
       
```
Purify would only catch this error if the function foo were actually executed, where Insure ++ would detect it at compile time (Well, that's only partly true... actually Purify won't detect this particular error *ever*. We'll get to that later on.)
Pre-ANSI code gets ANSI level prototype checking. This feature is of course, only useful for people with pre-ANSI compilers.
Data types are non-obscured (e.g. pointers vs. longs)
Some examples that compilers won't catch
```
		/* Example one: printf mismatches... not detected by proto
		types */
		#include <stdio.h>
		int main()
		{
		printf("%d\n", 1.0); /* Passing double instead of int */
		return 0;
		}
        
```
(Insure++ will catch errors like these even if the string isn't determined until runtime...but only static strings show up at compile time - Purify won't ever catch this error either).
Some compilers may catch some of these examples, while many others don't. Why take chances with potentially hard-to-find bugs?
All of this checking occurs before Purify even starts to look at the code with its object insertion. How common are these errors? More often than Purify would have you believe.

Memory State Tracking

To detect runtime errors, any tool needs to maintain extra information about the program. Again, Purify chooses a quick and easy approach, while Insure++ implements a more robust and precise algorithm.

Purify mains accessibility and initialized status of each byte of memory: 2 bits for each byte in your program.

To detect dangling pointers, portions of memory must be left unusable. To detect overwrites, guard fences must be written. Only stack frames can be checked, and not individual stack variables.

In contrast, Insure++ monitors each instrumented pointer exactly. No matter how large or small your overwrite, how long ago you malloc'd or free'd, Insure++ will detect your bugs.

Examples: limitations of Purify

Overwrites of Stack Variables.
Since Purify treats each entry on the stack frame as one big variable, it cannot detect overwrites within a function's local variables
```
		int main()
		{
		char a[10], b[10], c[10];
		int i;
		memset(b,0,11);
		for (i = 0; i <= 10; i++) {
		b[i] = 2;
		}
		return 0;
		}
       
```
Purify detects neither the overwrite by memset, nor the direct overwrite. Insure++ detects both. To detect any similar bug with Purify, you must "get lucky", and overwrite a small fence on the stack frame.
Overwrites of Heap Variables.
To detect an overwrite on the heap with Purify, you better hold on to your rabbit's foot; unless you hit one of Purify's fences, you'll get no warning. Consider the following code:
```
		int *third_party_string_function(p, ct)
		char *p;
		int ct;
		{
		p += ct;
		return (int *) p;
		}
 
		int main()
		{
		int *p, i;
		char *q1, *q2;
		q1 = (char *) malloc(5); memset(q1, 0, 5);
		q2 = (char *) malloc(5); memset(q2, 0, 5);
		i = (q2 - q1);
		p = third_party_string_function(q, i);
		printf("*p = %d\n", *p);
		free(q);
		return 0; 
		}
       
```
Insure++ knows where each pointer is supposed to be pointing without peeking around fences and guessing, and correctly identifies the error in this code, which Purify doesn't.
Dangling Pointers
Purify can detect dangling pointers reasonably well if you use them right after freeing the block; but if your program runs for awhile, it's likely someone else may have come along and be occupying the same memory. Such is the case in the following program, for which Purify detects no bugs.
```
		#include <stdio.h>
		int main()
		{
		char *a, *b;
		int ct;
		b = (char *) malloc(10240);
		a = (char *) malloc(1024); free(a);
		free(b);
		ct = 0;
		while(1) {
		ct++;
		b = (char *) malloc(1024);
		if (b < a && a < b+1024)
		break;
		free(b);
		free(b);
		}
		*a = 2;
		printf("ct = %d\n", ct);
		return 0;
		}
       
```
Insure++ detects the dangling pointer with no problems, no matter when it happens. This is a fundamental difference between the two technologies - one is exact, Insure++, and the other is heuristic, Purify.

Leak Detection

Comparing Insure++ and Purify's leak detection abilities is a somewhat activated function call, a list of all outstanding memory blocks, along with a guess of whether that block may have been leaked at some pointer or not. This guess is based on a scan of all available memory, and will yield the wrong answer if any location coincidentally contains the same value as the data block. More luck. This guessing shows up in the report by flagging each block as "not leaked (could be wrong), leaked (definitely a leak, but where?), and possibly leaked (??)".

Insure++, on the other hand, will report a leak as soon as it happens (e.g. when there are no pointers to that block). It will give a stack trace and reason for leakage as well as the point at which the block was malloc'd. Obviously, a leak will be much easier to fix with this information. Additionally, there is no luck involved.

In the following code, we simulate the case where a long variable happens to have the same address as a malloc'd block. This may not happen in your code, but as the size and complexity of your program increases, the odds become stacked against you. Again, Insure++ detects all three leaks exactly - as they occur.

	long l1, l2;
	void foo()
	{
	char *c;
	c = (char *) malloc(10); /* Pure: "not leaked" */
	l1 = (long) c;
	c = (char *) malloc(10); /* Pure: "possibly leaked" */
	l2 = (long) c+1;
	c = (char *) malloc(10); /* Pure: "leaked" */
	return;
	}
	int main()
	{
	foo();
	return 0;
	}

Third Party Libraries

One advantage of object code insertion over source code implementation is that third party libraries are checked as thoroughly (i.e. with the same set of heuristics) as user code. Insure++ checks primarily the instrumented source code. Hence, most errors that occur within third-party libraries will not be checked. Code that calls third party libraries is checked thoroughly.

If you have the source to the 3rd party library, you can, of course, make an instrumented library. If you don't have the source, you won't be able to even fix any bugs which might be present. Insure++ has chosen to do a more rigorous job of testing user code at the expense of such third-party library checking.

Insure++ can, however, check that 3rd party libraries are called correctly, and can verify return values, via a mechanism called interfaces. We'll explain that in the next section.

Extensibility

The object code insertion technology is very limited in scope. For example, global memory usage is handled only through trapping malloc and a few other standard system calls. If your program uses a custom memory allocator, you lose checking on all that region. Here's an example of a fixed block allocator - it allocates only 32 byte blocks, and is very fast. It's also built on top of malloc.

	static struct _block *avail_end = 0;
	static struct _block *avail = 0;
	struct _block *freelist = 0;
	struct _block
	{
	struct _block *next; /* Assume 4 byte pointers */
	char padding[28];
	};
	void *get_block()
	{
	struct _block *n;
	if (freelist) {
	n = freelist; freelist = freelist->next;
	} else if (avail < avail_end) {
	n = avail;
	avail++;
	} else {
	avail = (struct _block *) malloc(32 * 1000);
	avail_end = avail + 1000;
	n = avail; avail++;
	}
	return n;
	}
	void free_block(void *b)
	{
	((struct _block *) b)->next = freelist;
	freelist = (struct _block *) b;
	}

Here's a sample program which incorrectly uses this code:

	void *get_block(void);
	void free_block(void *b);
	int main()
	{
	char *a;
	a = (char *) get_block();
	memset(a,0,60);
	return 0;
	}

Without additional information, neither Insure++ nor Purify would know exactly how your program is using the memory, and would therefore not flag this code. With Purify, the trip stops here. With Insure++, however there is a simple way to explain how your special allocator works such that we can properly deal with the abstraction. This information is saved in an Interface File, which describes the interface to your Insight, however there is a simple way to explain how your special allocator works routines. Here's one for the fixed allocator:

	void *get_block()
	{
	void *ret;
 
	ret = get_block();
	iic_alloc(ret, 32);
	return ret;
	}
	void free_block(void *b)
	{
	iic_unalloc(b);
	free_block(b);
	}

As you can see, basically all that needs to be done is to say "get_block allocates unique 32 byte chunks", and "free_block frees them". Once this is in place, Insure++ will detect over-writes, dangling pointers,... everything it can with malloc. In the sample test code, you'd get the following error message:

	"f1.c", line 8: WRITE_OVERFLOW
	>> memset(a,0,60);
	Writing overflows memory: <argument 1>
	bbbbbb
	| 32 | 28 |
	wwwwwwwwwww
	Writing (w)   : 0x00107300 thru 0x0010733b (60 bytes)
	To block (b) : 0x00107300 thru 0x0010731f (32 bytes)
	   a, allocated at f1.c, 7

This kind of extensibility is a key factor of Insight's advanced memory tracking system and source code instrumentation technology.

Purify's object code insertion is fixed and stagnant. So much so, that just to compete they have patented their simple-to-implement algorithms to keep out competitors like TestCenter. Insure++, on the other hand, competes by being technologically superior, and on the true cutting edge of automatic bug detection techniques.

Conclusion

By implementing a simple and quick algorithm, Purify has a speed advantage over Insure++. That's it. While it's true that Purify checks third-party libraries which Insure++ does not, the goal of the product is not to find bugs in object modules which you don't have the source code for and cannot fix, but to find bugs in your own source code, which can be fully instrumented by Insure++, and only by Insure++.

For true state of the art error checking, Insure++ delivers exactness, whereas Purify relies on heuristics.

ParaSoft

Insure++

Analysis of Runtime Debugging Technology (Comparison of Insure++ and Purify)

Analysis of Runtime Debugging Technology
(Comparison of Insure++ and Purify)