Go to the previous, next section.

Compiling Files

Note: the procedures described in this section are only available in the `compiler.com' world image. Furthermore, cf is only available on machines that support native-code compilation.

Compilation Procedures

procedure+: cf filename [destination]

This is the program that transforms a source-code file into native-code binary form. If destination is not given, as in

(cf "foo")

cf compiles the file `foo.scm', producing the file `foo.com' (incidentally it will also produce `foo.bin', `foo.bci', and possibly `foo.ext'). If you later evaluate

(load "foo")

`foo.com' will be loaded rather than `foo.scm'.

If destination is given, it says where the output files should go. If this argument is a directory, they go in that directory, e.g.:

(cf "foo" "../bar/")

will take `foo.scm' and generate the file `../bar/foo.com'. If destination is not a directory, it is the root name of the output:

(cf "foo" "bar")

takes `foo.scm' and generates `bar.com'.

About the `.bci' files: these files contain the debugging information that Scheme uses when you call debug to examine compiled code. When you load a `.com' file, Scheme remembers where it was loaded from, and when the debugger (or pp) looks at the compiled code from that file, it attempts to find the `.bci' file in the same directory from which the `.com' file was loaded. Thus it is a good idea to leave these files together.

`.bci' files are stored in a compressed format. The debugger has to uncompress the files when it looks at them, and on a slow machine this can take a noticeable time. The system takes steps to reduce the impact of this behaviour: debugging information is cached in memory, and uncompressed versions of `.bci' files are kept around. The default behavior is that a temporary file is created and the `.bci' file is uncompressed into it. The temporary file is kept around for a while afterwards, and during that time if the uncompressed `.bci' file is needed the temporary file is used. Each such reference updates an `access time' that is associated with the temporary file. The garbage collector checks the access times of all such temporary files, and deletes any that have not been accessed in five minutes or more. All of the temporaries are deleted automatically when the Scheme process is killed.

Two other behaviors are available. One of them uncompresses the `.bci' file each time it is referenced, and the other uncompresses the `.bci' file and writes it back out as a `.bif' file (the old default). The `.bif' file remains after Scheme exits. The time interval and the behavior are controlled by variables. (These variables are not in the global environment; perhaps they should be. They are in the (runtime compiler-info) package environment.)

variable+: *save-uncompressed-files?*

This variable affects what happens when `.bci' files are uncompressed. It allows a trade-off between performance and disk space. There are three possible values:

#f: The uncompressed versions of `.bci' files are never saved. Each time the information is needed the `.bci' file is uncompressed. This option requires the minimum amount of disk space and is the slowest.
automatic: Uncompressed versions of `.bci' files are kept as temporary files. The temporary files are deleted when Scheme exits, and if they have not been used for a while. This is the default.
#t: The `.bci' files are uncompressed to permanent `.bif' files. These files remain on disk after Scheme exits, and are rather large - about twice the size of the corresponding `.bci' files. If you choose this option and you are running out of disk space you may delete the `.bif' files. They will be regenerated as needed.

variable+: *uncompressed-file-lifetime*

The minimum length of time that a temporary uncompressed version of a `.bci' file will stay on disk after it is last used. The time is in microseconds; the default is `300000' (five minutes).

variable+: load-debugging-info-on-demand?

If this variable is `#f', then printing a compiled procedure will print the procedure's name only if the debugging information for that procedure is already loaded. Otherwise, it will force the loading of the debugging information. The default value is #f.

procedure+: sf filename [destination]

sf is the program that transforms a source-code file into binary SCode form; it is used on machines that do not support native-code compilation. It performs numerous optimizations that can make your programs run considerably faster than unoptimized interpreted code. Also, the binary files that it generates load very quickly compared to source-code files.

The simplest way to use sf is just to say:

(sf filename)

This will cause your file to be transformed, and the resulting binary file to be written out with the same name, but with pathname type "bin". If you do not specify a pathname type on the input file, "scm" is assumed.

Like load, the first argument to sf may be a list of filenames rather than a single filename.

sf takes an optional second argument, which is the filename of the output file. If this argument is a directory, then the output file has its normal name but is put in that directory instead.

Declarations

Several declarations can be added to your programs to help cf and sf make them more efficient.

Standard Names

Normally, all files have a line

(declare (usual-integrations))

near their beginning, which tells the compiler that free variables whose names are defined in system-global-environment will not be shadowed by other definitions when the program is loaded. If you redefine some global name in your code, for example car, cdr, and cons, you should indicate it in the declaration:

(declare (usual-integrations car cdr cons))

You can obtain an alphabetically-sorted list of the names that the usual-integrations declaration affects by evaluating the following expression:

(eval '(sort (append usual-integrations/constant-names
                     usual-integrations/expansion-names)
             (lambda (x y)
               (string<=? (symbol->string x)
                          (symbol->string y))))
      (->environment '(scode-optimizer)))

In-line Coding

Another useful facility is the ability to in-line code procedure definitions. In fact, the compiler will perform full beta conversion, with automatic renaming, if you request it. Here are the relevant declarations:

declaration+: integrate name ...

The variables names must be defined in the same file as this declaration. Any reference to one of the named variables that appears in the same block as the declaration, or one of its descendant blocks, will be replaced by the corresponding definition's value expression.

declaration+: integrate-operator name ...

Similar to the integrate declaration, except that it only substitutes for references that appear in the operator position of a combination. All other references are ignored.

declaration+: integrate-external filename

Causes the compiler to use the top-level integrations provided by filename. filename should not specify a file type, and the source-code file that it names must have been previously processed by the compiler.

If filename is a relative filename (the normal case), it is interpreted as being relative to the file in which the declaration appears. Thus if the declaration appears in file `/usr/cph/foo.scm', then the compiler looks for a file called `/usr/cph/filename.ext'.

Note: When the compiler finds top-level integrations, it collects them and outputs them into an auxiliary file with extension `.ext'. This `.ext' file is what the integrate-external declaration refers to.

Note that the most common use of this facility, in-line coding of procedure definitions, requires a somewhat complicated use of these declarations. Because this is so common, there is a special form, define-integrable, which is like define but performs the appropriate declarations. For example:

(define-integrable (foo-bar foo bar)
  (vector-ref (vector-ref foo bar) 3))

Here is how you do the same thing without this special form: there should be an integrate-operator declaration for the procedure's name, and (internal to the procedure's definition) an integrate declaration for each of the procedure's parameters, like this:

(declare (integrate-operator foo-bar))

(define foo-bar
  (lambda (foo bar)
    (declare (integrate foo bar))
    (vector-ref (vector-ref foo bar) 3)))

The reason for this complication is as follows: the integrate-operator declaration finds all the references to foo-bar and replaces them with the lambda expression from the definition. Then, the integrate declarations take effect because the combination in which the reference to foo-bar occurred supplies code which is substituted throughout the body of the procedure definition. For example:

(foo-bar (car baz) (cdr baz))

First use the integrate-operator declaration:

((lambda (foo bar)
   (declare (integrate foo bar))
   (vector-ref (vector-ref foo bar) 3))
 (car baz)
 (cdr baz))

Next use the internal integrate declaration:

((lambda (foo bar)
   (vector-ref (vector-ref (car baz) (cdr baz)) 3))
 (car baz)
 (cdr baz))

Next notice that the variables foo and bar are not used, and eliminate them:

((lambda ()
   (vector-ref (vector-ref (car baz) (cdr baz)) 3)))

Finally, remove the ((lambda () ...)) to produce

(vector-ref (vector-ref (car baz) (cdr baz)) 3)

Operator Replacement

The replace-operator declaration is provided to inform the compiler that certain operators may be replaced by other operators depending on the number of arguments. For example:

Declaration:

(declare (replace-operator (map (2 map-2) (3 map-3))))

Replacements:

(map f x y z) ==> (map f x y z)
(map f x y) ==> (map-3 f x y)
(map f x) ==> (map-2 f x)
(map f) ==> (map f)
(map) ==> (map)

Presumably map-2 and map-3 are efficient versions of map that are written for exactly two and three arguments respectively. All the other cases are not expanded but are handled by the original, general map procedure, which is less efficient because it must handle a variable number of arguments.

declaration+: replace-operator name ...

The syntax of this declaration is

(replace-operator
  (name
    (nargs1 value1)
    (nargs2 value2)
    ...))

where

name is a symbol.
nargs1, nargs2 etc. are non-negative integers, or one of the following symbols: any, else or otherwise.
value1, value2 etc. are simple expressions in one of these forms:

'constant
A constant.

variable
A variable.

(primitive primitive-name [arity])
The primitive procedure named primitive-name. The optional element arity, a non-negative integer, specifies the number of arguments that the primitive accepts.

(global var)
A global variable.

The meanings of these fields are:

name is the name of the operator to be reduced. If is is not shadowed (for example, by a let) then it may be replaced according to the following rules.
If the operator has nargsN arguments then it is replaced with a call to valueN with the same arguments.
If the number of arguments is not listed, and one of the nargsN is any, else or otherwise, then the operation is replaced with a call to the corresponding valueN. Only one of the nargsN may be of this form.
If the number of arguments is not listed and none of the nargsN is any, else or otherwise, then the operation is not replaced.

Operator Reduction

The reduce-operator declaration is provided to inform the compiler that certain names are n-ary versions of binary operators. Here are some examples:

Declaration:

(declare (reduce-operator (cons* cons)))

Replacements:

(cons* x y z w) ==> (cons x (cons y (cons z w))),
(cons* x y) ==> (cons x y)
(cons* x) ==> x
(cons*) error--> too few arguments

Declaration:

(declare (reduce-operator (list cons (null-value '() any))))

Replacements:

(list x y z w) ==> (cons x (cons y (cons z (cons w '()))))
(list x y) ==> (cons x (cons y '()))
(list x) ==> (cons x '())
(list) ==> '()

Declaration:

(declare (reduce-operator (- %- (null-value 0 single) (group left))))

Replacements:

(- x y z w) ==> (%- (%- (%- x y) z) w)
(- x y) ==> (%- x y)
(- x) ==> (%- 0 x)
(-) ==> 0

Declaration:

(declare (reduce-operator (+ %+ (null-value 0 none) (group right))))

Replacements:

(+ x y z w) ==> (%+ x (%+ y (%+ z w)))
(+ x y) ==> (%+ x y)
(+ x) ==> x
(+) ==> 0

Note: This declaration does not cause an appropriate definition of %+ (in the last example) to appear in your code. It merely informs the compiler that certain optimizations can be performed on calls to + by replacing them with calls to %+. You should provide a definition of %+ as well, although it is not required.

Declaration:

(declare (reduce-operator (apply (primitive cons)
                                 (group right)
                                 (wrapper (global apply) 1))))

Replacements:

(apply f x y z w) ==> ((access apply ()) f (cons x (cons y (cons z w))))
(apply f x y) ==> ((access apply ()) f (cons x y))
(apply f x) ==> (apply f x)
(apply f) ==> (apply f)
(apply) ==> (apply)

declaration+: reduce-operator name ...

The general format of the declaration is (brackets denote optional elements):

(reduce-operator
  (name
    binop
    [(group ordering)]
    [(null-value value null-option)]
    [(singleton unop)]
    [(wrapper wrap [n])]
    [(maximum m)]
  ))

where

n and m are non-negative integers.
name is a symbol.
binop, value, unop, and wrap are simple expressions in one of these forms:

'constant
A constant.

variable
A variable.

(primitive primitive-name [arity])
The primitive procedure named primitive-name. The optional element arity specifies the number of arguments that the primitive accepts.

(global var)
A global variable.
null-option is either always, any, one, single, none, or empty.
ordering is either left, right, or associative.

The meaning of these fields is:

name is the name of the n-ary operation to be reduced.
binop is the binary operation into which the n-ary operation is to be reduced.
The group option specifies whether name associates to the right or left.
The null-value option specifies a value to use in the following cases:

none
empty
When no arguments are supplied to name, value is returned.

one
single
When a single argument is provided to name, value becomes the second argument to binop.

any
always
binop is used on the "last" argument, and value provides the remaining argument to binop.

In the above options, when value is supplied to binop, it is supplied on the left if grouping to the left, otherwise it is supplied on the right.
The singleton option specifies a function, unop, to be invoked on the single argument left. This option supersedes the null-value option, which can only take the value none.
The wrapper option specifies a function, wrap, to be invoked on the result of the outermost call to binop after the expansion. If n is provided it must be a non-negative integer indicating a number of arguments that are transferred verbatim from the original call to the wrapper. They are passed to the left of the reduction.
The maximum option specifies that calls with more than m arguments should not be reduced.

Go to the previous, next section.