Go to the previous, next section.
This appendix contains information mainly of interest to implementors and
maintainers of gawk
. Everything in it applies specifically to
gawk
, and not to other implementations.
See section Extensions In gawk
Not In S5R4, for a summary of the GNU extensions to the awk
language and program. All of these features can be turned off either by
compiling gawk
with `-DSTRICT' (not recommended), or by
invoking gawk
with the `-c' option.
If gawk
is compiled for debugging with `-DDEBUG', then there
are two more options available on the command line.
Both of these options are intended only for serious gawk
developers,
and not for the casual user. They probably have not even been compiled into
your version of gawk
, since they slow down execution.
The code for recognizing special file names such as `/dev/stdin' can be disabled at compile time with `-DNO_DEV_FD', or with `-DSTRICT'.
This section briefly lists extensions that indicate the directions we are
currently considering for gawk
.
printf
printf
and sprintf
functions may be enhanced to be
fully compatible with the specification for the printf
family
of functions in ANSI C.
RS
as a regexp
RS
may be generalized along the lines of FS
.
gawk
to the array ENVIRON
may be
propagated to subprocesses run by gawk
.
awk
array.
""
, as a field separator, will cause field
splitting and the split function to separate individual characters.
Thus, split(a, "abcd", "")
would yield a[1] == "a"
,
a[2] == "b"
, and so on.
egrep
syntax for regular expressions, now specified
with the `-e' option, may become the default, since the
POSIX standard may specify this.
Here are some projects that would-be gawk
hackers might like to take
on. They vary in size from a few days to a few weeks of programming,
depending on which one you choose and how fast a programmer you are. Please
send any improvements you write to the maintainers at the GNU
project.
gawk
uses the
backtracking regular expression matcher from the GNU subroutine library.
If a regexp is really going to be used a lot of times, it is faster to
convert it once to a description of a finite state machine, then run a
routine simulating that machine every time you want to match the regexp.
You might be able to use the matching routines used by GNU egrep
.
awk
programs: gawk
uses a Bison (YACC-like)
parser to convert the script given it into a syntax tree; the syntax
tree is then executed by a simple recursive evaluator. Both of these
steps incur a lot of overhead, since parsing can be slow (especially if
you also do the previous project and convert regular expressions to
finite state machines at compile time) and the recursive evaluator
performs many procedure calls to do even the simplest things.
It should be possible for gawk
to convert the script's parse tree
into a C program which the user would then compile, using the normal
C compiler and a special gawk
library to provide all the needed
functions (regexps, fields, associative arrays, type coercion, and so
on).
An easier possibility might be for an intermediate phase of awk
to
convert the parse tree into a linear byte code form like the one used
in GNU Emacs Lisp. The recursive evaluator would then be replaced by
a straight line byte code interpreter that would be intermediate in speed
between running a compiled program and doing what gawk
does
now.
Go to the previous, next section.