Find more Algol implementations with this Google search:

  http://www.google.com/search?hl=en&q=%22Syntax+of+ALGOL+60%22&btnG=Google+Search

and this wikipedia page:

  http://en.wikipedia.org/wiki/ALGOL


When reading http://www.heerdebeer.org/ALGOL/The_History_of_ALGOL.pdf
bear in mind the comments here:
http://compilers.iecc.com/comparch/article/06-09-070

The text below comes from http://www.gtoal.com/languages/algol60/427.pdf
-----------------------------------

WHETSTONE ALGOL REVISITED
OR
CONFESSIONS OF A COMPILER WRITER
by
B. RANDELL
The English Electric Co. Limited, Whetstone, Leics.
Paper presented at the first Compiler writing Seminar, 20th May, 1964,
Brighton College of Technology.
INTRODUCTION
A complete and detailed account of the Whetstone KDF9 Algol Compiler has
been given in the book 'Algol 60 Implementation' (4). The present paper uses the
wisdom born of hindsight in an attempt at a critical re-appraisal of the
Whetstone Compiler and the implementation techniques which are used in it.
DESIGN AIMS
The Whetstone Compiler, which is designed for use during the debugging
of Algol programs, consists of a Translator and a Control Routine. The
Translator is a single-pass translator producing an object program which is
interpreted at run time by the Control Routine. After an Algol program has been
debugged it can be translated using the Kidsgrove KDF9 Algol Compiler, a multipass
optimising compiler which produces a machine code object program. The
decision to develop an 'Algol System' incorporating two separate Algol compilers
was based on the fact that two of the most desirable qualities of a compiler,
namely speed of translation, and efficiency of object program produced, are to a
large extent incompatible. Thus in the Whetstone Compiler efficiency of object
program is sacrificed in order to obtain fast translation together with
extensive program checking and testing facilities, whilst in the Kidsgrove
Compiler considerable effort is put into transforming the Algol program into a
form which will make the best possible use of the computer hardware.
This 'twin compiler' technique, though in many ways very convenient,
does have certain disadvantages. Since the characteristics of the two compilers
are so distinct, a programmer must decide carefully when to assume that his
program is virtually error-free and worth compiling using the optimising
compiler. Furthermore the system is not very suitable for large programs which
are continually being refined and developed.
2
The most obvious alternative to the twin compiler system is the compiler
which is organised in such a way that the sections which attempt to optimise the
object program are bypassed unless the user specifies otherwise. Superficially
such a compiler would be just as satisfactory as two separate compilers, with
the additional advantage of making the change over from unoptimised to optimised
translation extremely easy. However, as will be seen from an examination of the
Whetstone and the Kidsgrove compilers, fast translation and optimised
translation can use widely differing techniques which may be difficult to
combine into a single compiler.
A near-ideal alternative would be an efficient compiler which could
incorporate minor changes to a source program with a minimum of retranslation
and which includes effective program checking and testing facilities. (A
possible method of implementing a compiler which permits such 'incremental
translation' has been described by Evans, Perlis and Van Zoeren (2).) The lack
of this ability to perform 'incremental translation' can be compensated for by
the manual splitting up of a program into 'segments' which are then translated
separately. When changes are made to a program only the segments which have been
affected have to be translated. In KDF9 Algol segments must be 'complete'
procedures (i.e. in which all identifiers are either formal parameters or local
to the procedure), and are also used to indicate program storage overlays.
However, because the two KDF9 Algol compilers produce such different styles of
object program, all segments of an Algol program must be processed by the same
compiler - this is unfortunate as the facility of combining segments produced by
either compiler would be very useful for large-scale programs which are
gradually modified and developed over a long period of time.
The original premise in this discussion of the twin compiler system was
that fast translation was incompatible with the production of efficient object
programs. This is surely true at the present time, since the production of an
efficient object program needs a sophisticated method of scanning a source
program in order to detect restricted uses of certain language forms which are
then treated specially so as to make the best possible use of all available
machine facilities. It is to be hoped that in the future the design of
programming languages and machines will become more systematic and elegant so
that, because of the lack of awkward restrictions and special cases, a more
straightforward process will suffice to transform a source program into an
efficient object program.
SINGLE-PASS TRANSLATION
One of the principal characteristics of the Whetstone Compiler is that
it performs the transformation of the Algol source program as the source program
is being read, so that the object program is complete and ready to be obeyed
almost immediately after the end of the Algol text has been reached. This socalled
'single-pass' method of translation has obvious advantages during program
testing, since the programmer can virtually ignore the existence of the
compiler, because his program, albeit in Algol, is read into the computer at
normal paper tape speed. These advantages are however largely nullified when
3
time-sharing machines, or a system for keeping programs which are currently
under test in some form of backing storage, are considered.
Single-pass translation, besides being a rather pleasing concept, is
also an interesting challenge to the compiler writer. The problem is that the
generation of object program coding must proceed virtually concurrently with the
analysis by the compiler of the language instructions forming the source program
- this analysis can be based only on the current portion of source program text,
together with any information which has been gleaned from text which has already
been scanned. There is no possibility of a preliminary scan to collect and
classify the various identifiers used in a program, leave alone the more
sophisticated forms of source program analysis which are attempted in many
multi-pass compilers.
The need to generate object program as the source program is analysed
would obviously be greatly facilitated if each section of source program could
be correctly and completely analysed using only information that has already
been scanned. This is unfortunately not the case with Algol, so the Whetstone
Compiler uses where necessary a technique of generating 'incomplete' object
program operations which are completed when the missing information becomes
available. This technique involves designing the object program so that the
various operations which would be produced from a temporarily ambiguous source
language construction have a basic similarity - this was possible in the
Whetstone Compiler where the object program was to be interpreted, and was
therefore free of the constraints imposed by machine hardware, such as the need
to give explicit instructions for floating integers.
Example
begin real procedure p;
begin ...
p = a + b;
...
end p ;
integer a; real b;
The declarations of the identifiers a and b which are used as nonlocal
variables inside procedure p can, as above, be given after the
declaration of p. Thus during the translation of the statement 'p := a + b'
it is not known whether a and b are real or integer variables or even real
or integer function designators without parameters. As a result the object
program operations produced by the Whetstone Compiler corresponding to
identifiers which are simple variables or parameter less function
4
designators take the same basic format, in fact an 8-bit operation code,
followed by a 16-bit address field.
The necessity of designing the object program so that certain classes of
operations have the same basic format would be removed if each operation carried
with it the address of its successor (an order code of this form was of course
common with early computers where 'optimum coding' was used to minimise drum
latency time, e.g. the IBM650). With such an order code the object program is
effectively stored as a 'list structure' and operations of varying size could
easily be added to or deleted from any part of the program at any stage during
translation. However an object program of this form would be considerably larger
than the equivalent serial object program because of all the extra program
addresses.
The technique used in the Whetstone Compiler to enable incomplete
operations to be replaced by complete operations when the missing information
becomes available is dependent on having the whole of the partially generated
object program available in core storage. The address fields of sequences of
incomplete operations corresponding to each identifier for which no declaration
information is available are 'chained' together, and the chains are traced
through when the operations and their address fields are being completed. Thus
although the Whetstone Compiler uses only a single scan through the Algol source
program, it does this at the cost of a set of what could be called small 'scans'
through short sequences of object program operations.
One of the most obvious ways of avoiding either some such technique of
chaining, or a preliminary scan of the Algol text just to collect declaration
information, is to impose the restriction that the declarations of all
identifiers other than labels must precede their use. (No ambiguity results if
only a single category of identifiers can be used before being declared - labels
are the obvious choice as the 'single category' because the translation of
forward jumps will in any case need some such technique as chaining). A
restriction of this form is normally a source of annoyance only to programmers
who have had access to a compiler which does not need this restriction, though
it does preclude the writing of explicit mutually recursive procedures. However
a pair of procedures can be made to call each other recursively without
infringing the rule that identifiers must be declared before being used, by
giving one procedure as a parameter to the other procedure.
Example
begin procedure p ; begin ... ; q ; ... end ;
procedure q ; begin ... ; p ; ... end ;
can be rewritten as
begin procedure p(Q) ; procedure Q; begin ... ; Q ; ... end ;
5
procedure q ; begin ... ; p(q) ; ... end ;
which has exactly the same effect, but avoids using the identifier q before
its declaration.
The technique of chaining used in the Whetstone Compiler to permit
single-pass translation of unrestricted Algol involves use of a 'name list'. The
name list is effectively a pushdown store containing items which give an
identifier and either its declaration details or the addresses of the start and
finish of the chain of incomplete operations which has been generated because
declaration information was not available. In a large program the name list can
take a considerable amount of storage, because at any given point during
translation the name list contains a number of groups of items, corresponding to
the current block and all its surrounding blocks, where each group contains one
three-word item for each different identifier that has been declared or used in
the corresponding block. In fact compared to the name list, the translator
stack, which is used to re-order arithmetic operations, and to analyse statement
structure, normally takes very little space. Thus on a KDF9 with a limited
amount of core storage, the amount of space required for the name list is one of
the main limiting factors on the size of Algol program that can be translated.
An obvious improvement that could be made to the compiler would be for it to use
backing storage for parts of the name list not in current use; this could be
arranged quite easily because the name list is organised effectively as a push
down store, and only the items corresponding to the current block, and
occasionally the immediately surrounding block, are ever needed at any one time.
It is because the name list contains entries corresponding to declared
identifiers which have been used but not declared in a block that only the top
one or two groups of items ever have to be searched to find whether there is an
entry corresponding to a given identifier. Thus the time taken to construct and
later to follow through the chains of incomplete operations is at least partly
offset by the fact that very little time is spent searching identifier lists.
However at first sight it seems unnecessary for the name list to contain an item
for each block which a given identifier is used.
Example
begin
real a, b;
a:= 0 ;
begin real c ;
c:= a ;
b:= 0 ;
6
...
After the translator has processed the statement 'b := 0' two
groups of items will have been set up in the name list. The first group,
corresponding to the outer block, will consist of two entries, giving
declaration information for the identifiers a and b; the second group,
corresponding to the inner block, will consist of an entry giving
declaration information for c, and two entries giving addresses of the
chains of incomplete operations generated from the uses of identifiers a
and b.
The declaration of an identifier can follow its use only if it is a
label, or it used within a procedure body as a non-local variable or within a
switch declaration (it is convenient to think of a switch as being a procedure
of type label). Thus in the above example within the inner block it is obvious,
from the way which they are used, that a and b are not labels, and hence must
correspond to the a and b declared to be real variables in the outer block.
Hence it would appear worththwhile to avoid generating name list entries,
corresponding to identifiers whose uses proves that they cannot be labels, if
the name list groups for any containing blocks (as opposed to procedures)
contain the same identifiers. However these apparently superfluous name list
entries also serve as a record that an identifier has been used as, say, a
variable in a given block, and hence cannot so be given as a label. Without
these name list entries, or some equivalent means storing this information, it
would not be possible for the translator to check out this type of error.
Example
begin real a ;
a := 0 ;
begin real b, c ;
a := b ;
a: b := b + 1 ;
...
The identifier a in the statement 'a := b' would be taken as
referring to the real variable a declared in the outer block. However, at
the point where a is given as a label an entry would be added to the group
of name list entries corresponding to the inner block, and the inconsistent
use being made of a would be detected.
A possible compromise would be for the translator to search the groups
of name list entries for the surrounding blocks until a procedure block is
reached, and to generate chains of incomplete operations only for those
7
identifiers for which no name list entries are found, but to still add an entry
to the name list for each identifier which is used, so that inconsistent uses of
identifiers are still detected. Thus one could save dome chaining at the cost of
some extra name list searching.
OBJECT PROGRAM INTERPRETATION
The second major characteristic of the Whetstone Compiler is that the
object program is not in machine code, but is obeyed interpretively by a
'Control Routine'. The most important reason for this is that it permits a
complete check to be made of all the actions of the object program and all
communication between program and programmer to be given in terms of the
original Algol source program. Thus the result of each arithmetic operation is
checked to be within the permitted range for integers or for real numbers, as
appropriate; and the address given by each subscripted variable is checked to be
within the storage area allocated to the corresponding array. Obviously the
increased running time caused by interpretation can only be justified while a
program is being checked out, and perhaps in some cases only during the initial
stages of checking out.
Since the object program is being interpreted it is possible to allow
the Control Routine to perform at run time some functions, such as deciding when
real/integer conversions are necessary, which the single pass translation
technique made inconvenient to perform at translation time. Thus the Control
Routine completes the checking of the consistency of use of real, integer and
Boolean variables, and of formal parameters with their corresponding actual
parameters, and simulates automatic stack hardware working in the core store.
The first function of the Control Routine, the checking of all
arithmetic operations, would well be performed be suitable checking and
interruption hardware, linked to a protected 'director' or 'master control
program', thus allowing the object program to be given in machine code. The
director would use tables produced by the compiler so that any communication
with the programmer could still be in terms of the original Algol source
program.
The second $unction of the Control Routine, the checking of the
consistency of use of variables of the various types, and the related question
of deciding when real/integer conversions are necessary, is not performed by the
Translator because the necessary declaration information is not usually
available so that incomplete object program operations have to be generated and
completed later. In fact the Control Routine stores the intermediate results of
arithmetic expressions as double-word 'accumulators', one word containing the
numerical value of the intermediate result, the other a bit pattern indicating
its type; the action of each arithmetic operation is dependent on the bit
patterns given in the accumulators containing its operands.
The third function of the Control Routine is to organise the core
storage as a stack for the storage of variables and arrays, and for the
8
calculation of expressions. As a result the object program representation of
arithmetic expressions can be given in a form of Reverse Polish, and the
Translator does not have to assign addresses to the intermediate results of
expressions. This makes the object program very compact but means that the
hardware facilities, such as the nesting store accumulator, provided on the KDF9
computer are virtually ignored. A description of the Kidsgrove which makes full
use of all these facilities, has been given by Hawkins and Huxtable (3).
However, it is possible that a compromise could be found between the facilities
demanded by the object program (here provided by means a Control Routine) and
the facilities which would be provided by machine hardware, which would enable a
reasonably simple and elegant translator to be used to produce machine code
object program which would run with an acceptable level of efficiency.
TRANSLATION TECHNIQUES
The Whetstone Compiler uses a development of the stack and priority
level translation technique first described by Dijkstra (1) to analyse the Algo1
source program and to control the generation of the corresponding object
program. The Translator is thus largely composed of a set of separate routines,
corresponding to the various delimiters, which use the translator stack and a
set of 'state variables', in which information about earlier sections of the
source program is recorded.
The system of translation, though perhaps a little ad hoc, particularly
in choice of state variables, has proved most satisfactory in practice,
particularly regards speed of translation; hence the author's slight preference
for a system of translation using syntactic routines (i.e. routines
corresponding to the various syntactic units of Algol, which can call each other
recursively) is based solely on the feeling that such a system is more elegant
and is more directly related to the formal definition of the Algol language. In
fact if an Algol compiler was available, perhaps on another computer, for use
during the development of a translator there would be a strong incentive to base
the translator on the syntactic routine system, and to benefit from the ease
with which recursive routines could be written.
As has been mentioned earlier, the Whetstone Compiler, because it is
performing a single pass translation on unrestricted Algol, does not have
available the necessary declaration information which is required to analyse, at
translation time, the types of the partial results that will be produced at run
time during the evaluation of expressions. In fact if the declaration
information was available then it would in general be fairly easy for the
translator to perform all the necessary analysis and checking of types, and to
generate the appropriate real/integer conversion instructions (see, for example,
Randell and Russell (5), which also describes a simple method of computing at
translation time any operations whose operands are numerical constants). However
certain minor features of Algol are not immediately amenable to the use of
translation-time type assessment. The first problem is that of the type of
conditional expressions.
9
Example
x: = (if i=O then x else i) X (if j=0 then y else j) ;
If x and y are real variables and i and j are integers then,
depending on the values of i and j, which are of course not known at
translation time, the multiplication can be of two real numbers, two
integers, or one real number and one integer. A reasonable solution to
this problem is to define the type of a conditional expression to be
real if any of the constituent simple expressions are of type real. The
second problem, to which there is apparently no satisfactory
unrestrictive solution, concerns the exponentiation operation.
Example
i ! j
where i and j are integers, is of type real if j is negative, but
otherwise is of type integer.
As it is, at translation time the Whetstone Compiler does a partial
check on the correctness of the types of identifiers used in expressions, by
using the rules pertaining to a given operator to check, where possible, the
types of identifiers which appear adjacent to the operator as its operands. No
attempt is made to check that the types of the partial results that will be
produced during the expression are compatible, and no use is made of any
declaration information known about an identifier should it be available. As
information about an identifier for which no declaration information is
available is gradually accumulated from the various ways in which the identifier
is used, a bit pattern is constructed in the corresponding name list entry. When
the declaration information becomes available it is checked that it is
compatible with the information given by the bit pattern. The bit pattern is
formed from two constituent bit patterns, one indicating whether the identifier
has been followed by an opening parenthesis or square bracket, the other giving
type information. It has since been realised that because the list entry also
contains an entry which indicates the number of dimensions if it is an array (or
parameters if a procedure), and which is set to zero for scalars, the use of the
first part of the bit pattern is an unnecessary complication.
The naive way in which for statements and subscripted variables are
treated in the Whetstone Compiler could not be tolerated if it was not designed
to be part a twin compiler system. No attempt is made to recognise simple uses
of for statements or subscripted variables; instead the Control Routine always
allows for the possibility that the full generality permitted in for statements
and subscripted variables is being used. By way of contrast, in the Kidsgrove
Optimising Compiler (see Hawkins and Huxtable (3)) considerable effort is
expended on finding cases of restricted uses of for statements and subscripted
variables, which are then translated into a form which makes efficient use of
the KDF9's powerful indexing and loop counting facilities.
10
CONCLUSIONS
The design of the Whetstone Compiler has been greatly influenced by the
fact that it is to be part of a twin compiler system. However the basic design
could be used without too many major modifications to produce a fairly fast
compiler which would generate an object program more nearly comparable with
ordinary machine code. On the other hand the problem of efficient implementation
of for statements and subscripted variables requires a more major effort, and is
very dependent on the quality of the machine's indexing facilities and order
code.
ACKNOWLEDGEMENTS
Mr. L. J. Russell, who has played a major part in the development and
implementation of the Whetstone Compiler, has also assisted the author in the
preparation of this paper, which is published by permission of the English
Electric Company Limited.
REFERENCES
1. Dijkstra, E.W. (1961). Making a Translator for Algol 60. A.P.I.C. Bull. 7,
pp. 3-11.
2. Evans, A., Perlis, A.J. and van Zoeren, H. (1961). The Use of Threaded Lists
in Constructing a Combined Algol and Machine-Like Assembly Processor.
Comm. A.C.M. 4, 1, pp. 36-41.
3. Hawkins, E.N. and Huxtable, D.H.R. (1963). A Multi-Pass Translation Scheme
for Algol 60. "Annual Review in Automatic' Programming", vol. 3,
pp.163-205. Pergamon Press, Oxford.
4. Randell, B. and Russell, 1.J. (1964). "Algol 60 Implementation". Academic
Press, London.
5. Randell, B. and Russell, L.J. (1964). Single-Scan Techniques for the
Translation of Arithmetic Expre3sions in Algol 60. J.A.C.M. 11, 2, pp.
159-167.