ELLIOTT 803 Algol 60 Compiler

The ELLIOTT 803 Algol 60 Compiler

The pages in this directory comprise the results of prolonged research into the Algol 60 compiler produced by Elliott Brothers between 1960-1962. The main sources for this research were the Algol 60 User Manual, issued by Elliotts, and two paper tapes which comprise the compiler and run-time system. The tapes will be referred to as "Tape 1" for the compiler, and "Tape 2" for the run-time system. These terms were in common use when the compiler was being used, and the tapes were clearly labelled with A104 TAPE1, or A104 TAPE 2.

The tapes had been transferred to more modern material - they are now on my hard disk. The manual is labelled "Issue 4" and dated "January 1965". I have not been able to determine the date of origin of the compiler tapes, so I am not sure if this was the final state of the compiler. There is, however, some evidence that this was referred to as Issue 3 of the compiler. This is revealed in the tape itself, which contains facilities for applying patches, and re-punching the modified system onto tape. The tapes always had a legible heading, which has been discarded from my copies, but since punching an updated tape included the code for producing these legible headings, the data used is also there and clearly states "A104 TAPE 1 ISSUE 3". It is certainly not the first issue, as it exhibits extensive evidence of 'patching', which was the technique used to make corrections in response to fault reports etc.

I will discuss the techniques used in reconstruction in more detail later.

The information contained in the pages here was semi-automatically constructed from the source files which I have constructed from these binary tapes. As a result some sections are not as clear as I would like. I hope to gradually improve this, and hope that the links between sections will be useful.

I have departed from an authentic representation, which was extremely cryptic, and difficult to follow. I have devised a macro assembler which uses mnemonics for most of the instruction opcodes, and provides symbolic names for labels and variables. The macros are fairly simple ones, and are used to simplify commonly used constructions. For a list of the original opcodes and my mnemonic equivalents, Click here. For an introduction to the techniques and an index into the Compiler itself (Tape 1) see the Compiler Index.

For the equivalent information for the Run Time System (Tape 2) see the RunTime System Index.

The process of reconstruction started by examining the tape structure. In use the tapes could be read into the 803 using the Initial Instructions. It was therefore fairly simple to read the first section of Tape 1. After ignoring leading blanks, the first 8 5-bit characters are packed into a 40-bit word. This defines the load address for the next section. This section consists of a series of words, each represented by 8 characters, which are loaded into succesive memory locations. One trap to beware of is blank characters, which may or may not be significant. The Initial Instruction routine doesn't count the 8 characters - it simply reads characters until an overflow occurs. Since the 803 word has only 39 bits this means that the 40 bits making up a word can have the 40th bit set to cause the overflow. If the top bit in a word is a one, then the extra bit is not needed, but may or may not be set. This depended on which program was used to generate the binary tape.

In order for the Initial Instructions to terminate, the code loaded is always read into the top of store. As a result, when the address wraps around the top of store, it overwrites the load address and causes a 'trigger' which modifies the instruction that stores each word into a jump instruction, thereby entering the program just loaded.

Knowing this much, it was fairly simple to write a program to simulate the II's and load the first section into a virtual memory. This was then printed out in a simple instruction format and could be examined to determine what it did when entered by a Trigger.

This first section forms the file called boot.t2. It is a simple bootstrap loader, which takes advantage of the fact that the Algol Compiler would only work if the Floating Point Option was installed. This had a fast left shift instruction, and this coupled with a loop-unrolling technique meant that the bootstrap loader could pack up the characters being read in slightly faster than the tape reader could deliver them. This was a problem with II as the slower shift instruction used there meant that it took slightly longer to pack a character than it took the tape reader to read the next character. Since the tape reader had only a single character buffer, this meant that the tape reader had to apply the brake, and this brought in a more significant delay before the following character could be read. It also made an awful noise! Since the Algol compiler was a pretty long tape, any improvement in load time was significant, so this special fast bootstrap was most welcome to computer operators.

The remainder of the tape was structured as a series of blocks, each with a load address and a checksum. This meant that if a tape got damaged it would be detected during loading and the damaged tape could be discarded and replaced. It was customary to keep a Master Copy of the the two tapes, which were only used if the working copies had all been damaged. Normally, additional working copies were also kept and were used to make further copies when needed (or when the operators had nothing better to do!).

The reconstruction effort then progressed to reading these subsequent blocks and loading them into a virtual store. They could then be output as assembler code and the logic of the compiler was slowly deciphered.

The simple disassembly program, was extended and a mapping file was constructed. This contained a list of symbols and addresses, along with some flags. This allowed me to specify which sections of memory were instructions, which were data. As sections of code started to make some sense, I was able to devise symbolic names for variable and labels, and these were entered into the file. The disassembler could then start to interpret the address part of instruction, substituting variable and label names for the addresses. After a while I had large sections of code labelled and the flow through the code became intelligible.

At some point it became advantageous to start adding comments into the code, and at this point, the disassembler became a hindrance. I had, by this time written an assembler which would take the source files produced by the disassembler and reconstruct a core image. A further program enabled me to compare the two core images and so detect any discrepancies that might creep in. This became a vital part of the reconstruction work, as simple edits could cause things to differ slightly from the original and these deviations could be quickly remedied.

At this point, I abandoned the disassembler, taking the final output from it as the starting point for manual editing of the source files. I could not have got this far without it, but it had served its purpose, and was no longer needed.

At some point in the proceedings, I split the source code into individual files, based on what appeared to be logical divisions. This was not entirely succesful, as there were some rather odd features in the code. To discuss this further, I need to describe the original coding scheme and how the code was maintained.

As far as I have been able to determine, all code produced by Elliotts for the 803 was written using a very simple assembler called A2, or later A102. These had identical formats, the later version being extended to support later optional features such as floating point numbers. The word assembler is perhaps not justified, as it relied on the coder to resolve most of the addressing that modern assembler perform.

Instructions were written on a standard coding form that had 7 columns: the first being a relative address, which started at 0 and went up to 29 on each page. If you needed more than 30 lines, you modified the numbers on subsequent pages to be 30-59, or even 60-89. Column 2 contain a two digit opcode, 3 was an address which was a decimal number, which could be follwed by a comma and possibly a further number. Column 4 was narrow, you could put a colon ':' or slash '/' in it. If left blank that was assumed to mean the same as colon. Column 5 was another opcode and column 6 was another address. Finally column 7 could be used to write comments.

The interpretation of the above was columns 2 to 6 were punched onto paper tape for loading, columns 1 and 7 were just there to aid the programmer and were not punched. Each 803 word could hold two 19-bit instructions, with the extra bit between then to indicate that the content of the address of the first instruction was used to modify the second instruction. The colon signified no modification, the slash meant modification took place.

The address parts of the instructions could be either absolute, in which case there was no comma, or relative, in which case there was a comma. If the comma was followed by a second number, then this was a block number, otherwise the address was relative to the current block of code. When a program was designed and coded it would be written as a series of blocks, which would be prefixed by a list of absolute addresses which determined where the blocks would be loaded. These addresses would then be used to convert the relative addresses to absolute ones.

I have transcribed a page of Elliott coding - the original scanned image was very hard to read - the ink had faded and the paper was dirty. By careful examination it was possible to reconstruct this page, using modern fonts which approximate to the original. The 'curly' script was originally hand-written text, and the arrows and lines were also done by hand.

It is obvious that this was a fairly primitive system, but it was sufficient at the time, and it was only when Elliotts moved on to larger, faster machines that the need for a more sophisticated system arose.

The Algol 60 Compiler was a large program by 803 standards, with over 4000 words of code. I estimate that the compiler had about 50 blocks and the run time system about 20. Keeping track of changes in a system like this would be difficult, and when they had to make fixes to correct problems, Elliotts avoided changing the memory layout wherever possible. As a result they made corrections by patching. This means that when a few instructions had to be added into a routine, they would replace an existing instruction with a jump to some other part of the store, where the new code, plus a jump back to the original routine would be stored. In places you can find a patch itself being patched with a further fix. Often it is not clear what is original code and what is patch!

In order to impart a modicum of readability into the code, I have adopted a technique wich works for this problem, but I would not recommend as a techique for general use. My assembler allows me to modify the load address to some arbitrary value. When I find what appears to be a patch, I move the code which provides the patch to be directly in line with the original code, then resume where I was before the switch, if that is appropriate. I have used a macro to implement this in most cases. The macro includes a jump to the new address, and a switch of program address to that point and a corresponding label for the jump.

My Assembly Language

The asembler program that I now use has evolved during its lifetime and has now reached some degree of complexity. This is partly due to the fact that I tend to leave it a while, then come back and try to modify it without remembering how it really works! The most recent change is to maintain a global dictionary, but permit compilation of individual component files. The main justification for this was to be able to produce the .html files for display on these web pages. I wanted each section to be a separate web page, and so it was neccesary to be able to assemble individual files. In the earlier stages, I was assembling and checking the full compiler each time I made a change. This produced a single listing file, which was OK for that purpose, but was too bulky to use as a web page.

The language that I am using, I call .t2 format, largely due to a misunderstanding. The original Elliott assembly program was A2, replaced by A102 when the floating-point instructions were added, so I guess it should be called .a102 format. I think that T2 was a bootstrap loader and so would have been a binary format. In any case, it bears little resemblance now to the format of the Elliott assembler, but the .t2 name persists.

In the original format, each line constituted a word in the 803 store. I prefer to work with one instruction per line, except where modification takes place, and then I include both instructions on a single line. It is also useful to do this for subroutine calls, since they must be correctly aligned together in a single word. Unfortunately, such pairs mess up the layout on my web page listings and the second instruction is squeezed into the address field of the first instruction of the pair. Maybe I'll be able to devise a better way of doing this at some point.

The assembler will automatically pack instructions two to a word and this can be seen in the Address column of the listings: the addresses in column 2 show a '+' sign if this is the second instruction in the word. This convention can also be used in the instruction address field (column 6).

If the assembler encounters a pair of instructions at point where it is expecting the second instruction of a pair, it will insert a no-op (00) instruction to bring the code into alignment. This was done manually in the original format, but often the programmer would insert a jump instruction, as this was slightly faster than a no-op.

The original A2/A102 format used two octal digits to specify opcodes. I have adopted mnemonics wherever I have been able to devise reasonable ones. The Elliott 803 instruction set is rather interesting, and there are some opcodes that defy a simple mnemonic, and for these I resort to the two octal digits, prefixed by the letter 'o'.

I have also introduced a macro facility. The macros as stored in a separate file called macro.t2, and this is read in before processing the source file. Any macros defined in this file are expanded and processed during assembly. The expansion of these macros appears following the line where they are called, and the line numbers in column 1 of the listing show the original line number, a '+' and a line count within the macro.

The original A102 could only accept numeric address fields to instructions, but I permit symbolic identifiers to be used, with simple modification if desired, mostly +/- a constant. There is a system to the naming, though it may not be obvious at a first glance. Global symbols begin with a letter, local symbol with a full stop '.'. What happens internally, is that any symbol beginning with a full stop is prefixed by a block name. Block names look like labels, but have two colons following them. It is valid to include a full stop within an identifier, so it is quite acceptable to fully qualify an idenitifer with block.name. This means that all symbols have global scope, but the local ones can be abbreviated within their domain.

A further subtlety occurs within a macro expansion. This becomes a sub-domain with the prefix being unique to the macro invocation. Since it is not easy to determine a global invocation counter, the name also includes a numeric file number to avoid clashes between macro calls in different files.

There are probably a number of odd quirks that I have included to enable me to reproduce exactly the bit patterns of the original code, but with a reasonably legible format. If you see something that doesn't make sense, drop me an e-mail and I'll try to explain it!

Checking the validity of the reconstruction

I have already metioned that I wrote a program to check the store images to ensure that my reconstructed version matched the original exactly. One problem that frequently arose, particularly when moving patches around, was that there would be a major difference in one area, and this could be difficult to determine what was going on. In order to simplify this, I pursuaded the disassembler to add, as a comment, the address from which each instruction had been found. This address was prefixed by an '@' character, to distinguish it, and the assembler was modified to check for this and verify that the current address matched that specified. This made it very easy to spot where the mistakes had been made. Once the code had settled down, I removed roughly three-quarters of these, so they now only appear on about one line in four. If they don't appear, the assembler assumes that things are OK, and only complains if it finds such an address which doesn't match up. These comments appear in the far right column of the listing pages.

One thing which puzzled me initially were odd words which contained an unusual value, which eventually I decided must be addresses stored by the subroutine linkage instruction (73). It hadn't dawned on me that this actually stores the return address twice, the second copy being shifted left by 24 bits (if you really want to know more, e-mail me). It would seem that whoever was given the job of producing the compiler tapes for distribution must have loaded the compiler, run a test program or two, then dumped out the compiler using a binary dump program written for this application. Otherwise, I'm sure the programmers who coded the compiler would have stored zero in these link locations.

If you want to know more about the compiler, or think I've got something wrong (which is more than likely), do e-mail me (bil 'at' beeb.net). I'm always happy to discuss things that are interesting!

Page created by Bill Purvis, last updated: 8th July, 2004

You are visitor number 215