The aim of the game, I think, is to try to improve on the reference compiler (RC). There are two obvious areas where something can be done.

The first of these relates to the way identifiers are stored. As far as I can tell, the RC uses four distinct tables of identifiers, stored as strings, which are scanned linearly, using strcmp, each time an identifier is looked up. There is one table for global variables, one for locals, one for procedure parameters, and one for #definitions.

The conceptual simplicity of this has a certain appeal, and it's acceptable only because the RC is a mere teaching example focusing on structure without too much emphasis on detail. In the old days it would have been frowned upon as hopelessly inefficient. Today's computers being much faster is some excuse, but nevertheless some injection of elegance would not go amiss.

Some way of reducing the number of comparisons per search would be nice. It could be something as simple as adding a sorted array of pointers as a layer on top of it, and using this to do a binary search each time. It could involve the traditional approach of using a hash table. My favourite method is the TrieScheme. And of course I would amalgamate the four tables into one.

The second would be to separate the close coupling of parsing and code generation somewhat. The RC generates code as it parses, which again benefits from conceptual simplicity, but it has the drawback of being inflexible, and makes it generate really awful code in the specific case of for loops. The code it generates, for example, for

for (i=1; i<9; i=i+1;) <loop body> endfor;

is as follows:

   i = 1;
A: Test i<9;
   Goto D if not;
   Goto C;
B: i = i+1;
   Goto A;
C: <loop body>
   Goto B;
D:

This (except for the fact that it might be better to replace

Goto D if not;
Goto C

with

Goto C if so;
Goto D;

because this would involve one less jump in the loop), is more or less forced upon it by the fact that the code for each of the three parts inside the for-brackets must be generated in the order in which they appear.

A slightly better sequence to generate would be:

   i = 1;
   goto A;
B: i = i+1;
A: Test i<9;
   Goto C if not;
   <loop body>
   Goto B;
C:

Even better:

   i = 1;
A: Test i<9;
   Goto B if not;
   <loop body>
   i = i+1;
   Goto A;
B:

These are only possible if code generation is detached from generating the parse tree. Doing so has other advantages as well, for example it makes certain optimisations possible, not only the folding of constant expressions, but by providing flexibility in the order in which sub-expressions are evaluated, it can end up needing fewer registers to evaluate an expression.

Typically, when a compiler runs out of registers in the middle of an expresison, it will resort to storing intermediate results in memory temporarily, though the RC actually just gives up and says the expression is too complex.

Consider that the expression a[b+c*d] - e[f+g*h], when converted to Reverse Polish, becomes abcd*+[efgh*+[-. Accordingly, it would need 5 registers to evaluate this.

A cleverer method would realise that b+c*d can be evaluated in 2 registers rather than 3, by doing

Load C
Load D
Multiply
Load B
Add

instead of the lame

Load B
Load C
Load D
Multiply
Add

In fact the perfectly innocuous looking statement

  z = a[b+c*d]-e[f+g*h];

is enough to make the RC give up, whereas re-writing it as

  z = a[c*d+b]-e[g*h+f];

is enough to make it cope.

One structural aspect of RC I am retaining is the procedural nature of its top-down parsing, that is, I'm not going for a table-driven parser. Instead I am replacing the grammar phrases which define expression structure with a single format for an expression, namely that -broadly speaking- an expression is either a simple operand or two operands joined with a binary operator. Instead of defining factors, terms, etc, I associate a numeric precedence value with each operator, and the parsing is achieved by a recursive precedence-based parser which is in some ways similar to Dijkstra's well known ShuntingAlgorithm.