Go to the previous, next section.

Implementation Notes

This appendix contains information mainly of interest to implementors and maintainers of gawk. Everything in it applies specifically to gawk, and not to other implementations.

Downwards Compatibility and Debugging

See section Extensions In gawk Not In S5R4, for a summary of the GNU extensions to the awk language and program. All of these features can be turned off either by compiling gawk with `-DSTRICT' (not recommended), or by invoking gawk with the `-c' option.

If gawk is compiled for debugging with `-DDEBUG', then there are two more options available on the command line.

`-d'
Print out debugging information during execution.

`-D'
Print out the parse stack information as the program is being parsed.

Both of these options are intended only for serious gawk developers, and not for the casual user. They probably have not even been compiled into your version of gawk, since they slow down execution.

The code for recognizing special file names such as `/dev/stdin' can be disabled at compile time with `-DNO_DEV_FD', or with `-DSTRICT'.

Probable Future Extensions

This section briefly lists extensions that indicate the directions we are currently considering for gawk.

ANSI C compatible printf
The printf and sprintf functions may be enhanced to be fully compatible with the specification for the printf family of functions in ANSI C.

RS as a regexp
The meaning of RS may be generalized along the lines of FS.

Control of subprocess environment
Changes made in gawk to the array ENVIRON may be propagated to subprocesses run by gawk.

Data bases
It may be possible to map an NDBM/GDBM file into an awk array.

Single-character fields
The null string, "", as a field separator, will cause field splitting and the split function to separate individual characters. Thus, split(a, "abcd", "") would yield a[1] == "a", a[2] == "b", and so on.

Fixed-length fields and records
A mechanism may be provided to allow the specification of fixed length fields and records.

Regexp syntax
The egrep syntax for regular expressions, now specified with the `-e' option, may become the default, since the POSIX standard may specify this.

Suggestions for Improvements

Here are some projects that would-be gawk hackers might like to take on. They vary in size from a few days to a few weeks of programming, depending on which one you choose and how fast a programmer you are. Please send any improvements you write to the maintainers at the GNU project.

  1. State machine regexp matcher: At present, gawk uses the backtracking regular expression matcher from the GNU subroutine library. If a regexp is really going to be used a lot of times, it is faster to convert it once to a description of a finite state machine, then run a routine simulating that machine every time you want to match the regexp. You might be able to use the matching routines used by GNU egrep.

  2. Compilation of awk programs: gawk uses a Bison (YACC-like) parser to convert the script given it into a syntax tree; the syntax tree is then executed by a simple recursive evaluator. Both of these steps incur a lot of overhead, since parsing can be slow (especially if you also do the previous project and convert regular expressions to finite state machines at compile time) and the recursive evaluator performs many procedure calls to do even the simplest things.

    It should be possible for gawk to convert the script's parse tree into a C program which the user would then compile, using the normal C compiler and a special gawk library to provide all the needed functions (regexps, fields, associative arrays, type coercion, and so on).

    An easier possibility might be for an intermediate phase of awk to convert the parse tree into a linear byte code form like the one used in GNU Emacs Lisp. The recursive evaluator would then be replaced by a straight line byte code interpreter that would be intermediate in speed between running a compiled program and doing what gawk does now.

  3. An error message section has not been included in this version of the manual. Perhaps some nice beta testers will document some of the messages for the future.

Go to the previous, next section.