The Ocode Machine Robert Firth Royal Military College of Science Introduction This document describes Ocode, the intermediate code of various compilers and codegenerators used at RMCS and elsewhere. Ocode is a simple intermediate code that forms the interface between front and back ends of a compiler. It is at a fairly low level, being largely linear, and with type mapping and address translation already performed. The version described here is a development of the Ocode designed by Martin Richards for the BCPL compiler. Currently, compilers to Ocode exist for Algol60, BCPL, Coral66, Fortran, and RTL/2. There are codegenerators for many machines, including Intel 8080 and 8086, MC68000, DEC PDP-11 and VAX-11, CA LSI-2 and NM4, DG Nova, Interdata 7/16 and 8/32. Ocode is intended as an intermediate code between a compiler and a codegenerator. It is not designed for interpretive execution, and to some extent is not suited to it. Further discussion of possible compiler and codegenerator strategies may be found in the companion documents 'Programming the Ocode Machine' and 'Codegenerating Ocode'. Data Types The Ocode Machine recognises three main data types: Integer Floating Address It assumes that an address is the same size as an integer, and that a floating value occupies the space of a whole number of integers. The machine also has a few special operations for handling bitfields and individual bytes, but does not recognise scalar variables smaller than an integer. Boolean Certain Ocode operations take a Boolean operand or return a Boolean result. In Ocode, the type Boolean is not distinguished as such; it is represented by the two Integer values: True -1 (all ones) False 0 (all zeros) Arithmetic and Logical operations are performed indiscriminately on integers and Booleans. Addresses An important concept to the machine is that of a scaled address. The machine assumes the existence of a 'true' hardware address whose manipulation is efficient, and a set of 'scaled' addresses that are possibly less efficient. The essential property of a scaled address is that adjacent variables of the same type have scaled addresses that differ by unity. Each data type therefore has a different scaled address. Indirection is performed on both true and scaled addresses, but address arithmetic is done only on scaled addresses. There are transfer functions between true and scaled addresses. Data Areas The Ocode machine has three conceptual data areas: Global Static Local The basic unit of addressing for data areas is the Ocode Word; that is, a unit capable of holding an integer or an address. Adjacent cells are numbered consecutively, but if the true machine is not word addressed, then consecutive Ocode numbers denote machine addresses that differ by the number of address units in a word, and require modification when generating the corresponding true operand addresses. Global and Local cells are allocated in consecutive sets, thus G10 G11 G12 are three adjacent Global cells, and P10 P11 P12 are three adjacent local cells. If an Ocode object occupies more than one cell, it is always addressed by the cell lowest in true machine store. The front ends must therefore be aware of the stack direction. Global Vector The Global data area is called the Global Vector. It consists of a sequence of cells, consecutive and in ascending order of true addresses, numbered from 1 to some small maximum, usually 399. All segments of an Ocode program share the same Global vector, which is assumed to be allocated in the run-time system and to exist throughout the life of the program. There are Ocode directives to effect load time initialisation of Global cells. Static Cells Static cells are local to individual Ocode segments, but exist throughout the life of the program. They are allocated by special Ocode directives, that set up named labels to refer to them. Labels are numbered from 1 upwards, and allocated consecutively by front ends. There are several allocating directives, that permit the static space to be partitioned into read-only and read-write areas, and into scalar and array areas. No relation should be assumed between the true addresses of two data labels, unless of course the codegenerator can compute one. Static cells may be initialised at load time by further Ocode directives. Local Area Each incarnation of a procedure has its own local area, consisting of a set of contiguous cells numbered from 0 upwards, though the first few are normally used to hold dynamic linkage information. Local areas are allocated at run time from a stack. Since on different machines the stack may grow towards either high addresses or low addresses, local cells may be in either ascending or descending order of true addresses. The Stack The Ocode machine has a single linear stack for local variables and temporaries. It resembles a zero-address machine, in that all operations take their operands from the stack and leave the result there. One of the assumptions of Ocode is that the local storage needed by a program can be calculated at compile time at the procedure level, and therefore that the stack need be changed only at procedure entry and exit. Moreover, the procedure calling sequence is best suited to an 'open' stack implementation, in which the calling procedure indicates at the point of call how much local storage it is currently using. Load and Store Instructions Load instructions always place the value on the top of the stack, and store instructions take it from the top of the stack. Global ------ LG g load an integer value from global cell g SG g store an integer value into global cell g LAG g load the true address of global cell g LGF g load a floating value from g SGF g store a floating value into g Static ------ LL x load an integer value from the static cell labelled x SL x store an integer value into x LAL x load the true address of x LLF x load a floating value from x SLF x store a floating value into x LIL x load an integer value from the cell whose true address is in x SIL x store likewise LILF x load a floating value from the cell whose true address is in x SILF x store likewise Local ----- LP p load an integer value from local cell p SP p store an integer value into p LAP p load the true address of p LPF p load a floating value from local cell p SPF p store a floating value into p LIP p load an integer value from the cell whose true address is in p SIP p store likewise LIPF p load a floating value from the cell whose true address is in p SIPF p store likewise Non-Local --- ----- FRAME f the next 'Local' instruction is to be interpreted as referring to a non-local dynamic cell, thus f = 1 the cell is in the lexically enclosing procedure f = 2 the cell is in the enclosing procedure two levels out ... f = -1 the cell is at the outermost level f = 0 the cell is local after all LEVEL f load the stack pointer of the enclosing procedure denoted by f (f is interpreted as for FRAME) Notes: It is assumed that non-local references are infrequent The FRAME instruction should be immediately followed by the 'local' instruction, and refers only to that instruction The form FRAME -1 should always be generated for references to the outermost level, from any depth of nesting LEVEL f is equivalent to FRAME f; LAP 0 but may be optimisable Most Ocode implementations obtain access to outer frames by following a 'static chain', but keep the outermost frame pointer also in a global, allowing fast access to frame -1 Absolute -------- LIN k load an integer value from the cell whose true address is k SIN k store likewise LINF k load a floating value from the cell whose true address is k SINF k store likewise Notes: Absolute addresses are idiomatically 'indirect number', and indeed Ocode indirection on a pure number is another way to generate them Constants --------- LN n load the integer value n LNF f load the floating value f, where f is a character string in standard scientific form, eg +314.159\-2 LSTR n c1 c2 ... cn load the string constant of length n whose characters are given (in decimal) by c1 .. cn, eg LSTR 4 65 66 67 68 loads "ABCD" TRUE load the constant TRUE FALSE load the constant FALSE LFZ load floating zero LFI load floating INFINITY QUERY load an arbitrary integer value Notes: Strings, and often floating numbers, might have to be stored in a separate data space, and hence loaded indirectly from labelled data cells. Since front ends allocate labels from 1 upwards, back ends should allocate such cells from some maximum (eg 999) downwards QUERY is used to initialise some undefined locals, or to represent missing actual parameters, and should be implemented in whatever manner is most efficient Operations ========== All operations take their operands from the stack, with the left operand the lower down, and place their result on the stack Operands preceded by * are monadic; all others are diadic Integer ------- PLUS addition MINUS subtraction MULT multiplication DIV division REM remainder * NEG negation Notes: The Ocode machine normally does not recognise integer overflow Division and remainder must be consistent, but their effect for negative operands is not otherwise defined Multiplication is defined to generate a single-length product Comparison ---------- These all take two integer operands and yield the Boolean result TRUE or FALSE EQ equality NE inequality LS less than GR greater than LE not greater than GE not less than Notes: See also JT, JF Logical ------- LOGAND bitwise disjunction LOGOR bitwise conjunction LSHIFT left operand shifted left by right operand RSHIFT left operand logically shifted right EQV bitwise equivalence NEQV bitwise nonequivalence NAND LOGAND of left operand with complement of right * NOT bitwise complement Notes: The right operand of LSHIFT, RSHIFT may be assumed non-negative Floating -------- These resemble their integer equivalents, but perform the corresponding floating operations. The arithmetic operators return a floating result, and the comparisons a Boolean PLUSF MINUSF MULF DIVF * NEGF EQF NEF LSF GRF LEF GEF Notes: The Ocode machine recognises two singular points of the floating domain: rational zero and rational infinity. These values are also denoted by LFZ, LFI dynamically and ITFZ, ITFI as load time initial values It is assumed that underflow during floating arithmetic is always mapped onto rational zero Depending on the machine, overflow may either cause an exception or be mapped onto rational infinity Exponentiation -------------- IPOWER raise the floating left operand to the power given by the integer right operand, with floating result POWER raise floating to floating power Notes: IPOWER with a small constant right operand is usually optimisable; otherwise, out-of-line code is normally required Conversions ----------- * FIX floating to integer * FLOAT integer to floating * RFLOAT float the next-to-top of stack, leaving the top unchanged Notes: Observe that RFLOAT may nevertheless imply moving the top of stack value, since the one below it may have changed in size Other Operations ===== ========== These are put here mainly for convenience Bit Field Operations --- ----- ---------- BITSRV tb bp Take the value from the top of stack, extract from it an unsigned bitfield, and replace the result, right justified and zero extended, on the stack The bitfield is of size tb bits and its rightmost bit is at position bp. Bits are numbered from the right starting at 0 (This is the Coral BITS[tb,bp] operation) SIGNRV tb bp as above, but the bitfield is signed and must be sign extended into the full word BITSLV tb bp the rightmost tb bits of the value next to top are stored in the bitfield [tb,bp] of the variable whose true address is on the top, and both operands are removed from the stack Notes: It is considered good codegeneration to make the assignment implied by BITSLV indivisible, so that parallel operations on disjoint bitfields in the same word do not interfere Code Statement ---- --------- The directive CODE introduces a section of embedded machine code. The code follows the directive in the form of a sequence of decimal numbers representing the separate characters of the code sequence. Within this, there are two special trip characters TRIP1 Ocode address follows TRIP2 End of code sequence At present, these are 128 and zero, respectively The Ocode address is a pair of decimal numbers, the first representing a character and the second an integer, thus G n global n L n static n P n local n Q n nonlocal n A n absolute address n N n number n F n frame n All but the last should be replaced by the corresponding Assembler address. The Frame cue should cause code to be emitted to load the stack frame of the given outer lexical context; a subsequent Q operand refers to that level and assumes the frame has been loaded. It is the responsibility of the programmer to generate Frame instructions in the code whenever he needs them The exact conventions for inserting machine code in Ocode are of course machine dependent; however, there are certain general rules that are widely observed. Normally, the codegenerator may assume the code insert will not corrupt the registers or data structures that implement the Ocode machine, for instance the stack pointer, but that it will destroy all the accumulators Miscellaneous ------------- REV reverse the top two operands on the stack Address Scaling and Indirection ======= ======= === =========== These operations are often optimisable. In particular, front ends sometimes emit scaling operations and then rescaling ones, possibly with a numerical operation in between. Also, scaling followed by indirection can be turned into a machine indirect address mode; indirection after addition can be turned into a base-offset address mode, and so on Address Scaling ------- ------- These are all monadic operations on the top of stack ATOI convert true address to scaled integer address ATOF floating ATOB byte ITOA convert scaled integer address to true address FTOA floating BTOA byte ITOF convert scaled integer address to scaled floating address Notes: If all objects are stored in natural alignment, then scaling and rescaling operations are simple shifts. Otherwise, more complex code may be necessary, including rotates The operation ATOB assumes that every Byte in the machine store can be given a unique integer address. On some machines, eg NM4 or Mod1, this assumption is false Obsolescent Operations ----------- ---------- The operations LLG g LLL x LLL p are still generated by some front ends. They are equivalent to LAG g; ATOI LAL x; ATOI LAP p; ATOI and so generate scaled integer addresses Indirection ----------- The RV operations are fetches, take a scaled address from the stack, and load the addressed value instead The STIND operations are stores, and store the value next to top in the scaled address on the top, removing both operands RV load an integer value from the given scaled address RVF floating RVB byte STIND store an integer value in the given scaled address STINDF floating STINDB byte RVTF load a floating value from a scaled INTEGER address STINDTF store a floating value into a scaled INTEGER address Notes: The anomalous RVTF, STINDTF operations are used to access floating elements in packed data structures such as Coral Tables Array Accessing ----- --------- INDEX This is equivalent to PLUS, but is emitted by front ends when the right operand is known to be an array root. This permits some optimisation Jump Table ---- ----- RVS This takes two operands. The next to top is an integer index, and the top is the true address of a jump table. Both operands are replaced by the true code address referenced by the appropriate element of the jump table This is a special instruction because many machines permit an economical form of jump table, eg using relative offsets instead of absolute addresses. See also the ITEML directive Transfers of Control ========= == ======= In the Ocode machine, code is labelled by special directives, that set code labels at specific points in the instruction stream. Transfers of control will name these labels as their destinations. Because all high level control structures are broken up by front ends, many labels do not appear explicitly in the original source, but are compiler generated, as are many transfers of control. To permit good optimisation, Ocode distinguishes many types of label and control transfer. However, front ends allocate unique numbers to all labels, regardless of type, and in an arbitrary order Code Labels ---- ------ LAB x LABR x set label x at the current code position LABX x The difference is LAB this is a compiler generated label with no backward jumps LABR this is a compiler generated label with some backward jumps LABX this is a source language label The Ocode machine permits simple jumps out of blocks, but not out of procedures. Given that the dynamic stack is allocated at procedure level, simple jumps are usually very efficient LABEQ x y This is a directive to indicate that x and y are the same label. It is for the convenience of one-pass front ends Jumps ----- These are compiler generated jumps to code labels. The label is given by an explicit number, but the special number 0 denotes a jump to the immediately following instruction JUMP x jump to code label x JT x jump to x if the top of stack is TRUE JF x x FALSE Notes: By convention, JF will jump only if the top of stack is FALSE, ie 0, but JT will jump if it is anything else, ie nonzero. In any event, the value is popped JT, JF can usually be combined with a preceding comparison, to avoid generating a true Boolean value that will immediately be popped Result Jumps ------ ----- The Ocode machine has a small extra data region where values can be stored across transfers of control. Since such transfers may reset the stack top, that place cannot be on the stack. It is usually one or more true machine registers RES x save the integer value on the stack and jump to x FRES x save the floating value likewise DRES x save the top two integer values likewise RSTACK n reset the stack to n; then load the saved integer value RFSTACK n likewise with the saved floating value RDSTACK n likewise with the two saved integer values Notes: The idiom RES 0 is permitted, meaning a result jump to an immediately following instruction. This must usually copy the saved value to the special data area, since a stack resetting directive may come next Gotos ----- Explicit source language jumps are implemented by these operations GOTO jump to the true address given on the stack LONGJUMP the top of stack is a stack frame pointer and the next to top is a true address. Reset the Ocode stack to the one and jump to the other SWITCHON n d c1 x1 c2 x2 ... cn xn Compare the top of stack with the n values c1 .. cn. If it is equal to any of them, jump to the corresponding label of x1 .. xn. If it is equal to none, jump to label d Notes: LONGJUMP implements jumps out of procedures. The right operand will usually have been generated by a LEVEL operation SWITCHON is of course the high level language CASE statement. The front ends are responsible for seeing that the values ci are all distinct, but they may be dense or sparse, and the back ends should generate appropriately compact code for all value ranges Stack Resetting ----- --------- The Ocode stack has a base pointer, changed only at procedure level, and a conceptual top-of-stack pointer where operands are stored. On most true machines, there will be no actual TOS pointer, since the stack size is always known, and an explicit offset from the base pointer can be used Similarly, most operands will not actually be loaded on the stack, and most stack setting directives will not generate code STACK n set the top of stack to n This is used both to increase the stack size, for instance when more local variables have been declared, and to reduce it, for example at the end of an inner block. The stack is normally reset at labels, and at RSTACK points where a saved value has been propagated STORE store local operands This is emitted after a sequence of Load instructions, and is a signal that the values loaded should indeed be put into their correct places in the Local data area Procedure Call --------- ---- The Ocode machine has a stylised procedure call, thus 1. Space is reserved on the stack for a 'call frame', a space to hold linkage information 2. The actual parameters are loaded above this 3. The procedure to be called is loaded 4. The stack is set to the new frame 5. The procedure is called 6. On return, the stack is reset, either with or without a propagated result The actions (2) and (3) use ordinary load directives; the others are special MARK n Raise the stack to n. This is just like a STACK n directive, but is emitted only to create a new call frame RTAP m Call a routine whose call frame starts at cell m. Note that the current stack value will be higher, since the actuals will have been loaded This operation must also effect the stack movement, either explicitly or by planting information that will enable the called routine to do so The routine to be called is the value on the top of the stack RTAP 0 m Exactly as RTAP m, but the preferred form RTAP 1 m Call the routine that is next to top, passing the top value as the lexical environment pointer (static chain) FNAP m FNAP 0 m FNAP 1 m Likewise call an integer function FFNAP m FFNAP 0 m FFNAP 1 m Likewise call a floating function On return from a procedure, the stack must be reset to eliminate the parameters and call frame, and any result must then be loaded Procedure Body ========= ==== A procedure body is bracketed by Ocode entry and return instructions. It is the responsibility of the front ends to ensure that procedures are reached only by proper call instructions Entry Point ----- ----- ENTRY n x c1 c2 ... cn The label x is the entry point of a procedure named c1 .. cn. The name is given as a sequence of decimal numbers as for LSTR Every procedure has a unique entry point, so defined Initial Context ------- ------- SAVE n Set the procedure local stack initially to n It is assumed that P0 .. Pn-1 contain the linkage information and any parameters, and the necessary entry code should be emitted to effect this. The initial top of stack is therefore N, that is, Pn is the first free cell STARTPROC e t1 t2 .. tm 0 n Set the procedure stack initially to return link e=0: no static chain is required e=1: a static chain has been passed ti=1: next parameter is an integer ti=2: next parameter is a float 0 : end of parameter list n : initial top of stack after all the above Appropriate code should be generated to effect this. Every high level language procedure will begin with one of these directives, immediately after the ENTRY. Since links, parameters, static chains and the rest are passed in many ways on many machines, the code sequence will be peculiar. Exceptionally, a procedure will consist only of an Entry directive followed by a Code statement. Such a procedure has been written in machine code using some code insertion facility of the high level language. In this case, the codegenerator must assume that all entry and exit code has been provided by the programmer, and should emit nothing other that the entry label and the code statement Note that, since in general it is not possible always to know whether a procedure requires a static link, STARTPROC 1 does not guarantee that a true static link has been passed, and RTAP 1 does not guarantee that the link will be stored by the called procedure. It is best if the static link is put somewhere harmless on call, and retrieved if needed by the called procedure Return ------ RTRN return from routine (no result) FNRN return from a function, saving TOS as integer result FFNRN likewise with floating result The result is picked up by the caller after FNAP or FFNAP. It is the responsibility of front ends to retrieve floating results only from floating functions, but note that it is legal to throw a result away, so a call by RTAP may match a return by FNRN or FFNRN End Cue --- --- ENDPROC s x This is the end of procedure whose entry label was x, and the code uses at most s words of dynamic stack Data Directives ==== ========== Ocode sets up static data space by labelling and initialisation directives. These are normally converted into the appropriate load-time directives for the true machine. The pattern is, that labels are set in data space by appropriate ...LAB directives, and all subsequent initialisations apply to successive cells in that data space Labelling Directives --------- ---------- CONSTLAB x set label x in readonly scalar space DATALAB x set label x in readwrite scalar space ARRAYLAB x set label x in readwrite array space STRINGLAB x set label x in readonly constant pool space StringLab is never emitted by front ends; it is provided as a standard way for back ends to generate string or floating constants in data pools Initialisation Directives -------------- ---------- ITEMB b set the byte value b INTMN n set the integer value n ITEMF f set the floating value f SPACE k reserve k words without initialisation These are used to create initialised or uninitialised static data. By convention, scalars are never uninitialised ITZ set integer zero ITM set the largest integer ITFZ set floating zero ITFI set floating infinity These directives set special values of the corresponding domains. They are provided because it may be less convenient to recognise such values directly ITEML x set the true address of label x ITEMS n c1 c2 ... cn insert the string c .. cn in the string pool, and set here its Ocode (usually scaled) address These directives should be used with circumspection. Not all machines permit the load time setting of scaled addresses, and on some machines ItemL must set up a relative or compressed address. If a jump table is set up by ItemL directives, then the RVS instruction must be coded to interpret them properly ROOT ... This is currently a rather inept attempt at a directive to indicate that a static array has been set up. It is intended to permit various codegenerator optimisations, but should be revised Linkage ======= Linkage is accomplished by directives that initialise the Global vector. These directives must generate some appropriate link time action that results in the complete vector eventually being set up SETGV g v set integer v in global cell g SETGL g x set true address of label x in global cell g In addition, globals above the top of the vector are often treated specially, for example as direct external symbols Cross Reference ----- --------- These are useful if the codegenerator produces symbolic Assembler code. They allow the user to make rather more sense of the output LINE k This code came from source line k. The output should be so annotated In addition, it is helpful to annotate important parts of the code, such as entry points, call and return points, Gotos, and code statements XREF a n The Ocode address a corresponds to the source name n. This directive is terminated by a newline. If a cross reference file is in use, generate a cross reference by giving the Ocode address and the source name. The latter should be copied directly from the Ocode; the former is given by a character and a number, thus Gn global n Ln static n Pn local n An absolute address n Nn number n This is the same nomenclature as in Code statements, but the character is given directly and not by its decimal equivalent Terminators ----------- SEGEND This is emitted at the end of a segment. Further segments may follow END This is the end of Ocode input. However, most front ends do not emit it, and end-of-file is equivalent NONE This is provided to give back ends a standard representation for the Ocode operation with no effect -------------------------------------------------------------------------------- Robert Firth RMCS Shrivenham February 1982