This is a work-in-progress version of the grammar. Everyone who works on the compiler should update this as more details are reverse-engineered from the reference implementation.

Except for the following two changes, which we have agreed, the intention is to mirror exactly what the reference implementation accepts.

Omit the interrupt procedure style.

De-couple shift operators from comparators, and thus allow several shift operators per expression containing them, while retaining limiting the number of comparators per boolean factor to one.

              {phrase}       0, 1, or more occurances of phrase.
              (a | b)        1 occurance of a or of b.
              [phrase]       0 or 1 occurance of phrase.

tinc = {global} {proc} "program" block .

A Tiny program consists of global variable declarations (scalars, pointers, and arrays in any order) followed by procedure declarations, followed by the main program.

block = "begin" {local} {statement} "end" .

Procedure or program blocks are bracketed by "begin" and "end", and local variable declarations must come before any executable statements. Statements are optionally separated by semicolons. Line breaks do not act as statement separators.

global = "#include" string | local .
local = directive | type decl {"," decl} [";"] .

directive = "#define" ident token | "#ifdef" ident | "#endif" .

Directives may appear wherever declarations or executable statements may appear. The only subtle distinction between global and local declarations is that the #include directive may appear at the global level, but not locally within blocks.

Incidentally, directives may not appear in the procedure declaration section (except within the procedures themselves, of course), only in amongst the variables.

type = "unsigned" ("char" | "int") | "char" | "int" .
decl = "*" ident | ident ["[" num "]"] .

Variable identfiers denote either a scalar of the relevant type, a pointer to such a scalar, or a fixed size array of scalars. There are no arrays of pointers, nor pointers to arrays.

Note that arrays are 1-dimensional and start at 0.

The number of elements in an array is specified as a constant, and arrays are indexed from 0 up to (that value)-1.

Variables are of one of 4 types, being "char" or "int", each of which may be either signed or unsigned. Variables are declared as signed if the unsigned prefix is omitted.

proc = "procedure" ident "(" [formal {"," formal}] ")" [block] .

Both in procedure definitions and their calls, the parentheses must be present even if the procedure has no parameters (the assertion that procedures must have at least one parameter is incorrect).

Procedures may not be nested, but may be recursive, including mutually.

Thus "forward" declarations (prototypes) are permitted, and are distinguished by there being no "begin" after the ")". Hence the optional "block" in the above grammar rule.

There are no functions, so any results must be passed through global variables or by writing through pointer parameters (such as int *result).

formal = type ["*"] [ident] | ident .
 or alternatively
formal = formal1 | formal2 | formal3 .
formal1 = type ["*"] ident .
formal2 = type ["*"] [ident] .
formal3 = ident .

In a normal procedure declaration, variant formal1 applies, i.e. both type and identifier must be present for all parameters.

In a prototype declaration, variant formal2 applies, in which the identifiers may (but need not) be omitted but the type information is required.

In the definition of a procedure for which a prototype has previously appeared, variant formal3 applies: the type and pointer indicator must be omitted, only identifiers should appear.

This language feature is clearly designed to dispense with the need to cross-check a procedure declaration parameter list against the types given in a previous prototype. Curiously, there appears to be no actual type checking when parameters are passed, the only check being to make sure the number of parameters is right.

Question: do the parameter names have to match between the prototype and the actual body, if given?

No. Otherwise there might as well not be any parameter list attached to the body. The names in the prototype, if any, are forgotten just as they are when reaching the "end" of a procedure body, only the types (and mode - i.e. whether pointer) are remembered.

In a prototype, if the first name is absent, all the others must be absent, and if the first name is present, all the others must be present. If names are present, they must not duplicate each other.

statement = (if | while | for | "#inline" string | directive | simple) [';'] .

if = "if" "(" boolexpr ")" {statement} ["else" {statement}] "endif" .
while = "while" "(" boolexpr ")" {statement} "endwhile" .
for = "for" "(" simple [";"] boolexpr [";"] simple [";"] ")" {statement} "endfor"

Directives may appear where statements can appear, and in addition to #define/#ifdef/#endif, #inline is also allowed. Statements are either structured if/while/for groups, or simple statements.

Question: although the top-level program block allows for the optional semi-colon between statements, I don't think they're correctly allowed for in this grammar between statements in the if/while/for statement lists? -- My mistake; now fixed (since all instances of statement allow the semicolon it seemed more appropriate to absorb it within).

simple = ident "(" {(string|expr) [","]} ")" | variable "=" boolexpr .

A simple statement is either a procedure call or an assignment. Notice how the comma separating procedure parameters is optional.

Other than as part of #include and #inline, the only place where a string constant can appear is as parameter to a procedure, and then presumably it is passed as the address of the first character.

variable = "*" (hexnum | ident) | ident ["["expr"]"]

A variable is either a pointer dereference or a scalar or an array element.

boolexpr   = boolterm {orop boolterm} .
boolterm   = boolfactor {"&" boolfactor} .
boolfactor = ["!"] relation .
relation   = expr [relop expr] .
expr       = sum {shiftop sum} .
sum        = [addop] term {addop term} .
term       = factor {mulop factor} .

orop    = "|" | "~" .
relop   = "<>" | "<" | ">=" | ">" | "!=" | "=" .
shiftop = "<<" | ">>" .
addop   = '+' | '-' .
mulop   = '*' | '/' .

Question: the definition of sum appears to allow a null entry. I presume the coding is intended to allow for a leading + or - ? -- Yes, my mistake; fixed.

Expressions consist of basic operands combined into expressions using operators of varying levels of precedence. OR and XOR are lowest, multiply/divide highest, with the rest inbetween. In ascending order the others are AND, comparators, shifts, and add/subtract.

Unary NOT is permitted and binds tighter than AND but weaker than comparators. Unary minus (and even plus) is permitted but only at the beginning of a sum expression. It is achieved by basically making the first term in a sum optional, and deeming it to be replaced by the constant zero if it is absent (yes, really, in "a = +b" it actually adds zero to b before assigning it to a).

"|", "~", and "!" are boolean/logical OR, XOR, and NOT, respectively.

"!=" and "<>" are two options for NOT EQUAL.

factor = "(" boolexpr ")"
       | "&" ident ["[" expr "]"]
       | variable
       | num
       | hexnum .

Basic operands, other than parenthesised arbitrary sub-expressions, are either the address of a scalar variable or of an array element, or a normal variable (as above, including pointer dereferencing), or a constant.

Lexing conventions/built-in phrases:

ident: "[A-Za-z][A-Za-z0-9_]*"
num: "[0-9][0-9]*" | charlit
hexnum: "$[0-9A-Fa-f][0-9A-Fa-f]*"

charlit: <single_quote> <any character> <single_quote>
string: <double_quote> <sequence of not double_quote> <double_quote>
How do you encode a double quote in a string? - You can't, but you can put a single quote into a charlit by using three single quotes.

Do you allow newline as a valid string character? - Yes, and into a charlit too.