« THIS IS A FORK OF THE CALCULATOR, WHICH WILL COMPILE RAINER'S "LC" LANGUAGE »
/* Note: code forces a divide-by-zero error in order to allow point of error to be trapped by gdb, thus allowing a backtrace. Also... find my code which restarts any program automatically under gdb if not already invoked that way. Add it to the build so that we always get a backtrace on error. (explain the mechanism on compilers101) To do: instead of outputting decls, procs, and code all interspersed, have a tree-walk scan the code for each type of data and output only that type so that we can not only plant the __main__: label properly, but also output procs after the code, and output data to an appropriate psect. Need to ensure that the language supports '//' comments Seemed to hang indefinitely when given a badly formed const: '\' Lesson roadmap: minimise stack depth by rotating symmetric nodes, leads to minimising number of registers... (same algorithm, reuse code) See regression test "fncall.t" - fn name is displayed wrongly in ast debugging (although generated code appears OK). FIXED! Add the no. of cells in a trip to the trip itself (immediately after the opcode - adjust the macros accordingly...) so that things like function definitions and function calls are not linked lists. the trip-debugging code also needs to be adjusted to match. have the opcode field point to an opcode trip? similar level of flattening for ops at the same precedence level? (why, apart from it looks nicer as a tree in graphviz????) do a proper symbol table and generate a tree describing a declaration. generalise getvar and use 'var's where appropriate, clean up current messy code where we extract the string replace the lexer with the simpler one from the precedence demo? CFG: SERIALISE THE AST! Needed for loops/conditionals/gotos. how much of the non-flow-control can be left as a tree??? Do we need a 'link' field in the AST (or should I use a separate data structure altogether? There's no real need to...) Is this merely a case of flattening the tree by using the 'SEQUENCE' operator? How about just adding a label field to a sequence op? Is the LHS always code and the RHS always a link? Serialised AST *requires* REDIR nodes. Probably a lot of link scanning, since we can't easily compact code or index into it when making optimisation changes. to add: if/then/else while/for/until/break signal??? (error handling?) procedures/functions multiple result syntax? but not until the recent messes caused by adding simple functions have been cleaned up. Don't let the code deteriorate *and* bloat at the same time. Add I/O (put/get char from stdin/stdout, and keyboard/screen) arrays, structs, pointers??? Can we keep this a 'safe' language? strings as indexes into string pool. range checked. no absolute ram addresses ever. (Note: interpreter does *not* require serialisation! Can't let Tim get away with taking this shortcut!) See rainer's "LC" - are we missing any features it needs? Can we get away with a version that uses 32-bit ints and no chars/words? (maybe justify it by using UTF8 for our character set?) do we canonicalise all ast items so that a recursive compile() does the right thing, vs having to look down particular branches with a different procedure because the same node might have two different meanings in different contexts? I.e. an AST cleanup phase to make the actual code generation phase simpler. (& more drop-through cases) (first example that springs to mind - VARs in param list) maybe some times same code *needs* to be interpreted twice in different ways, eg one pass over params to add to symbol table and declare local storage, and then a second to pop from stack at entry to proc. conflating code by drop-through in switch statements. should DEFFN just be another DECLARE object with a type of FN? Add handler to generate code in different psects so that we can effectively switch output streams to, say, write procedures at the end of the file to avoid having to jump over them... (fix the trailing '\' bug in ctohtml/cfold! - or have taken read the C source, and warn if \ followed by a non-blank non-comment line?) extract procs to smaller files - plug-in swap of modules as new features added. Need a #include pre-processor (to regenerate single- file format for folded display)? (handle "..." includes only, not <>'s) (move cfold to windows... update unix copy with portability fixes before updating with function-call code.) generate HTML listing files and embed graphviz images. remove ascii tree stuff (which is rather big) Revisit having return address on data stack? How awkward would that be? Tree traversal to detect recursion? How expensive? (PDP15 soln) Wasn't even thinking about optimisations yet! (SSA etc) [what was the thing call that recombined a loop? Y-node? Combinator?] However... 'clean' functions allow compile-time constant folding of function calls. Is there a way to generalise this with the current interpreter rather than the usual methods of constant folding? Maybe passing back something like 'NAN' which causes fast exit from procs/loops etc whenever encountered???? how about "a = b + badfn() * c + d", where c is known to be 0. NAN scheme would cause early return and miss opportunity to optimise. inline replacement using AST is more powerful than C's macros because you can obey scoping rules and make it identical to a real fn call. (Reminiscent of lambda functions/functional language programming) almost forgot the old 'tail recursion' hack. is there any way we can generalise it and get even more cases? Question: do a version that duplicates Rainer's LC so that we can compile his port of ecce? Or edit the ecce port to match the new language (esp. Tim's version of the language, as a test piece...) Code generation - external calls via DLLs. Calling conventions, register conventions. Use of two conventions - one for internal calling, another for external callers that follows system standards. Simple stack machine - is there a macro assembler for windows that will let us compile using the current output??? If not should we write one? :-) Re earlier comment of psects - generate procedures to a serialised AST, allows us to output the code in arbitrary order (see Robertson's talk at Edinburgh, anecdote paper on Ackerman's) i.e. can we do the same tricks as Robertson's pass three by juggling AST nodes at some appropriate stage in the conversion from AST to CFG to code? */
// You are reading a C source file displayed using 'folds'. Click on one of the // sections below to unfold its contents. Reclicking, or clicking on the expanded // text, will collapse the expanded text back to the summary. // If you are interested in the software used here (rather than the actual program, // which is a demonstration compiler for the Yahoo 'compilers101' group) then // you can read about it at my blog: // // http://techennui.blogspot.com/2008/04/semi-literate-programming.html « Include files, consts, incidental globals »
« Parser support »
« Data structures »
// We use a stringpool, and strings are indexes into this pool. This // is useful for the same reasons that the AST is an array indexed // by integers rather than a struct with pointers. It may also // save space by reusing common strings. And we get a free tag // to describe strings. (Also we can compare strings just with // an integer tag comparison, if we ever want to) // not everything uses this yet. search for malloc and realloc and strdup and free to find potential problems #define MAXPOOL (1024*16) char stringpool[MAXPOOL]; int nextstring = 0; int str_to_pool(char *s) { int tag; for (tag = 0; tag <= nextstring; tag++) { if (strcmp(stringpool+tag, s) == 0) { return tag; /* found, one way or another */ } } } « C[] is the source character token stream »
« A[] is the Analysis record. Contents point to a C[] structure when the item is a BIP, otherwise the format is [phraseno] [altno] [no-of-subphrases] [subphrases and/or BIPs...] for example, if A[25] contained 10, that would mean that the token for A[25] was stored in C[10]. (And the C string would be at &stringpool[c[A[25]]]?) »
/* variables used by line-reconstruction (lexer) */ FILE *sourcefile; char *curfile; int startline = TRUE, whitespace = TRUE, lineno = 1, col = 0, ch, peek;
« Support procedures »
« Incidental I/O support »
« Support proc for storage management »
« Support procs for storing lexical units »
« simple proc to recognise if a token is a keyword »
« Main procedures - the parser and the code generator (which embodies the grammar) »
« Line reconstruction, which for this language equates to lexing »
static int xfgetc(FILE *f); static void xungetc(int c, FILE *f); void line_reconstruction(void) { « Pre-process input ready for parsing. Tokens are stored in array C[] »
for (;;) { ch = xfgetc(sourcefile); if (ch == EOF) break; ch &= 255; // int, positive. peek = xfgetc(sourcefile); xungetc(peek, sourcefile); if (isalpha(ch)) { « token or keyword »
} else if (isdigit(ch)) { « Number »
} else switch (ch) { case '$': « Hex constant \$[0-9a-fA-F]+ »
break; case '\'': // Handle 'c' character constants case '"': // Handle "string" « literals »
case '/': « COMMENTS (or just a divide symbol) »
// WHITESPACE case '\n': lineno++; case '\r': startline = TRUE; col = 0; whitespace = TRUE; break; case '\t': case ' ': col++; // Does not affect whitespace break; // DIRECTIVES case '#': // If we interpret any #-directives while lexing, we don't want to // do an expensive test on every token, so what we can do is set // a countdown timer on the introductory token (either this '#' // or the actual keyword such as 'ifdef') and then test that the // *previous* tokens match when the timer hits 0, eg // C[cp-3] == '#' && C[cp-2] == 'include' ... etc if (!whitespace) { // WARN: probably an error... should not be any '#' symbols in the // middle of a line. (This language uses "!=" or "<>" for not-equal) } // Drop through default: whitespace = FALSE; storec(ch, lineno, col++, TYPE_CHAR, curfile); } } // set up a dummy at the end because we sometimes look ahead by 1 // in the parsing code and don't want to hit uninitialised data. c[nextfree].t = TYPE_EOF; c[nextfree].s = "<EOF>"; c[nextfree].l = lineno; c[nextfree].col = col;
}
« Abstract Syntax Tree data structures »
typedef int TRIP; // A 'trip' was originally a 'triple' of <opcode, operand, operand> // Now we use n-ary tuples but retain the name for nostalgia's sake... #define MAXTRIPS (1024*10) int AST[MAXTRIPS]; /* Should use flex arrays here too... */ int nexttrip = 0; #ifdef DEBUG int checkast(TRIP idx) { if (idx < 0) { fprintf(stderr, "Run-time error: negative index AST[%d] is not valid!\n", idx); idx = idx / 0; } else if (idx >= MAXTRIPS) { fprintf(stderr, "Run-time error: AST[%d] is out of range (max %d)!\n", idx, MAXTRIPS); idx = idx / 0; } else { return idx; } } #else #define checkast(x) x #endif #define opsym(root) AST[checkast(root)] #define leftchild(root) AST[checkast((root)+1)] #define rightchild(root) AST[checkast((root)+2)] #define nthchild(root, n) AST[checkast((root)+n)] « Several tables describing the AST opcodes. Some are only relevant to diagnostic output. »
// the first 3 tables are used by the compiler: typedef enum opcode { « names of AST operators »
} OPCODE; int prio[] = { « priority level of infix operators »
}; int arity[] = { « Number of operands in the tuple for this opcode. Most are triples. »
}; // name was intended only for debugging, but in this initial implementation // it is also being used as the source of mnemonics for the code generator. // In real life, the machine opcodes would _not_ map 1:1 with AST operators. char *name[] = { « ascii representation of AST operator names for debugging »
}; // the following tables are only used for debugging: char *shortname[] = { « ascii representation of AST operators for display when drawing trees »
}; char *c_infix_op[] = { « ascii representation of infix operators for display. Non-infix ops have dummy values. »
}; int display_children[] = { « In tree-drawing code, how many children do we draw for this node? Not always the same as the arity. »
}; int display_leftchild[] = { « Do we display the left child of the node? »
}; int display_rightchild[] = { « Do we display the right child of the node? »
};
static char xline[1024]; static int xi = 0, xcur_line = 1; static void xungetc(int c, FILE *f) { if (xi > 0) { xi -= 1; } else if (c == '\n') { xcur_line -= 1; xi = strlen(xline); } ungetc(c, f); } static int xfgetc(FILE *f) { int c, ch; c = fgetc(f); if (c == EOF) return EOF; ch = c&255; if (ch == '\n') { xline[xi] = '\0'; xi = 0; (void)make_binary_tuple(LINE, xcur_line++, (int)strdup(xline)); } else xline[xi++] = ch; if (xi == 1023) xi = 1022; xline[xi] = '\0'; return c; } char *nameof(TRIP t) { « Extract the name of a variable or the value of a const. Stored in stringpool. Space is never reclaimed, it just grows linearly. Still, has to be better than using malloc and all the overhead/fragmentation that goes with it. »
} #ifdef DEBUG int checkop(int idx) { if (idx < 0) { fprintf(stderr, "Run-time error: negative index arity[%d] is not valid!\n", idx); idx = idx / 0; } else if (idx >= MAX_OPCODE) { fprintf(stderr, "Run-time error: arity[%d] is out of range (max %d)!\n", idx, (int)MAX_OPCODE); idx = idx / 0; } else { return idx; } } #else #define checkop(i) (i) #endif #define tripsize(i) arity[checkop((int)opsym(i))] void printtrip(TRIP i); // note we're taking a slight shortcut here by storing the opcode directly in the // first field. If fact a cleaner design might have been to make that field point to // an 'opcode' trip, which is a unary operator whose sole parameter is the opcode. // this would require one extra indirection at the point of access, which could // be hidden in a procedure call anyway. TRIP make_unary_tuple(OPCODE op, TRIP parm1) { « Create a tuple for a unary operator »
} TRIP make_binary_tuple(OPCODE op, TRIP parm1, TRIP parm2) { « Create a tuple for a binary operator »
} TRIP mkop(OPCODE op) { return make_binary_tuple(OPERATOR, op, -1); // rightchild could be ptr to string? } TRIP make_nary_tuple(OPCODE op, TRIP parm1, ...) { « Create a tuple for an n-ary operator. Uses stdargs for arbitrary no. of params »
} TRIP make_proc_name(int pooloffset) { return make_binary_tuple(TAG, 0, pooloffset); } TRIP make_int_const(int datatype, char *value) { return make_binary_tuple(CONST, datatype, (int)atol(value)); } TRIP make_real_const(int datatype, char *value) { return make_binary_tuple(CONST, datatype, (int)atof(value) /* Hacky. Should fix this. */); } TRIP getvar(char *s) { /* tag must exist */ « Look up a tag in the string pool. »
} TRIP newtag(char *s) { /* tag must *not* exist */ « Create a new tag and add it to the stringpool. »
return make_binary_tuple(TAG, 0, tag); }
« Debugging »
#if defined(DEBUG_TRIPS_AFTER) || defined(DEBUG_TRIPS_DURING) « Diagnostic procedure to print a triple. »
« This section really isn't very interesting. It's not structured for folding, and it's not relevant to the compiler algorithms. You can comfortably skip this part of the code... »
/* this is ONLY used for diagnostics and is the same in all test harness programs */ void printtrip(TRIP i) { int op, parm1, parm2; if (i < 0) { fprintf(stdout, "** Too small: TRIP %d\n", i); (void)*(int *)0; } if (i >= MAXTRIPS) { fprintf(stdout, "** Too large: TRIP %d\n", i); (void)*(int *)0; } op = opsym(i); parm1 = leftchild(i); parm2 = rightchild(i); switch (opsym(i)) { case DECLARESCALAR: fprintf(stdout, "// %d: %s [@AST %d] ; TODO - modify for different types\n", i, name[op], parm1); break; case VAR: fprintf(stdout, "// %d: %s %s[@AST %d] TYPE=%s\n", i, name[op], stringpool+rightchild(parm1), parm1, name[parm2]); break; case TAG: /* parm1 not used for the moment */ fprintf(stdout, "// %d: %s %s[@Stringpool %d]\n", i, name[op], stringpool+parm2, parm2); break; case DEFFN: fprintf(stdout, "// %d: %s \"%s\" [arglist @AST %d] [fnbody @AST %d]\n", i, name[op], stringpool+rightchild(leftchild(parm1)), parm2, nthchild(i, 3)); break; case CONST: fprintf(stdout, "// %d: %s %s %d\n", i, name[op], name[parm1], parm2); break; case OPERATOR: fprintf(stdout, "// %d: %s %s (Orphaned AST cell)\n", i, name[op], name[parm1]); break; case ASSIGNSCALAR: fprintf(stdout, "// %d: %s [declared @AST %d] [value @AST %d]\n", i, name[op], parm1, parm2); break; case IFTHENELSE: fprintf(stdout, "// %d: %s [condition @AST %d] [then-statement @AST %d] [else-statements @AST %d]\n", i, name[op], parm1, parm2, nthchild(i, 3)); break; case IFTHEN: fprintf(stdout, "// %d: %s [condition @AST %d] [then-statement @AST %d]\n", i, name[op], parm1, parm2); break; case LINE: fprintf(stdout, "// %d: %s %d: %s\n", i, name[op], parm1, (char *)parm2); break; case AND: case OR: case ADD: case SUB: case MUL: case DIV: case LSH: case RSH: case EXP: fprintf(stdout, "// %d: %s [@AST %d] [@AST %d]\n", i, name[op], parm1, parm2); break; case EQ: case NE: case LT: case GT: case LE: case GE: fprintf(stdout, "// %d: [@AST %d] %s [@AST %d]\n", i, parm1, name[op], parm2); break; case NEG: case NOT: fprintf(stdout, "// %d: %s [@AST %d]\n", i, name[op], parm1); break; default: if (arity[opsym(i)] == 3) { fprintf(stdout, "// %d: %s %d %d\n", i, name[op], parm1, parm2); } else if (arity[opsym(i)] == 2) { fprintf(stdout, "// %d: %s [@AST %d]\n", i, name[op], parm1); } else { fprintf(stdout, "// %d: %s ...? (%d ops)\n", i, name[op], arity[opsym(i)]); } break; } }
#endif #ifdef DEBUG_TREES « Ascii-art debugging procedure for drawing trees »
« OK, look, you really don't want to expand this section. It's not structured for folding, and it's not relevant to the compiler algorithms. Trust me, you can skip this... (and did I mention it was huge and monolithic???) »
int del = 4/*was 1*/; /* distance of graph columns */ int eps = 3; /* distance of graph lines */ /* interface for drawing (can be replaced by "real" graphic using GD or other) */ void graphInit (void); void graphFinish(); void graphBox (char *s, int *w, int *h); void graphDrawBox (char *s, int c, int l); void graphDrawArrow (int c1, int l1, int c2, int l2); /* recursive drawing of the syntax tree */ void exNode (int trip, int c, int l, int *ce, int *cm, int depth, int *needed); /*****************************************************************************/ /* main entry point of the manipulation of the syntax tree */ /* draw_tree is taken from the yacc/lex demo by Thomas Niemann at http://epaperpress.com/lexandyacc/ - I prefer my own drawing code whenever possible (the trees look prettier) but my code does not handle n-tuples where n > 2, so I fall back to this code if a diagram requires such a tuple. */ void draw_tree_orig(int root); void dottree(int root) { char *operator; TRIP leftkid, rightkid; int linkno = 0; operator = nameof(root); leftkid = leftchild(root); rightkid = rightchild(root); fprintf(stdout, "\"node%d\" [\n label = \"<f0> %d: %s ", root, root, operator); if (display_leftchild[opsym(root)]) fprintf(stdout, "| <f%0d> %d: ", ++linkno, root+1); if (display_rightchild[opsym(root)]) fprintf(stdout, "| <f%0d> %d: ", ++linkno, root+2); linkno = 0; fprintf(stdout, "\"\n shape = \"record\"\n];\n\n"); if (display_leftchild[opsym(root)]) dottree(leftkid); if (display_rightchild[opsym(root)]) dottree(rightkid); if (display_leftchild[opsym(root)]) fprintf(stdout, "\"node%0d\":f1 -> \"node%0d\":f0 [\n id = %d\n];\n\n", root, leftchild(root), 888); if (display_rightchild[opsym(root)]) fprintf(stdout, "\"node%0d\":f2 -> \"node%0d\":f0 [\n id = %d\n];\n\n", root, rightchild(root), 999); } void draw_tree(int trip) { int rte, rtm, needed; fprintf(stdout, "\nTree for AST[%d]:\n", trip); graphInit (); needed = FALSE; exNode (trip, 0, 0, &rte, &rtm, 0, &needed); if (needed) { graphFinish(); } else { draw_tree_orig(trip); } #ifdef DOT_SUPPORT fprintf(stdout, "digraph g {\n"); fprintf(stdout, "graph [\n"); fprintf(stdout, " rankdir = \"LR\"\n"); fprintf(stdout, "];\n\n"); fprintf(stdout, "node [\n"); fprintf(stdout, " fontsize = \"16\"\n"); fprintf(stdout, " shape = \"ellipse\"\n"); fprintf(stdout, "];\n\n"); fprintf(stdout, "edge [\n"); fprintf(stdout, "];\n\n"); dottree(trip); fprintf(stdout, "}\n"); #endif } /*c----cm---ce----> drawing of leaf-nodes l leaf-info */ /*c---------------cm--------------ce----> drawing of non-leaf-nodes l node-info * | * ------------- ...---- * | | | * v v v * child1 child2 ... child-n * che che che *cs cs cs cs * */ void indentsp(int d) { int i; for (i = 0; i < d*4; i++) { putchar(' '); } } void exNode ( int trip, int c, int l, /* start column and line of node */ int *ce, int *cm, /* resulting end column and mid of node */ int depth, int *needed ) { int op; int w, h; /* node width and height */ char *s; /* node text */ int cbar; /* "real" start column of node (centred above subnodes) */ int k; /* child number */ int che, chm; /* end column and mid of children */ int cs; /* start column of children */ char word[40]; //indentsp(depth);fprintf(stdout, "start: TRIP=%d startcol=%d startline=%d\n", trip, c, l); if (trip == -1) return; op = opsym(trip); if (display_children[op] >= 3) *needed = TRUE; s = nameof(trip); sprintf(word, "%s", s); s = word; //indentsp(depth);fprintf(stdout, "graphbox: s = %s\n", s); /* construct node text box */ graphBox (s, &w, &h); cbar = c; //indentsp(depth);fprintf(stdout, "assign: c=%d\n", c); *ce = c + w; *cm = c + w / 2; /* node is leaf */ if ( (op == CONST || op == VAR || display_children[op] == 0) ) { //indentsp(depth);fprintf(stdout, "drawbox: s = %s cbar=%d\n", s, cbar); graphDrawBox (s, cbar, l); return; } /* node has children */ cs = c; //indentsp(depth);fprintf(stdout, "node has %d children: cs=c=%d\n", display_children[op], c); for (k = 1; k <= display_children[op]; k++) { //indentsp(depth);fprintf(stdout, "%d: exnode1 %d cs=%d\n", k, nthchild(trip, k), cs); exNode (nthchild(trip, k), cs, l+h+eps, &che, &chm, depth+1, needed); cs = che; } /* total node width */ if (w < che - c) { cbar += (che - c - w) / 2; *ce = che; *cm = (c + che) / 2; } /* draw node */ //indentsp(depth);fprintf(stdout, "cbar=%d\n", cbar); graphDrawBox (s, cbar, l); /* draw arrows (not optimal: children are drawn a second time) */ cs = c; for (k = 1; k <= display_children[op]; k++) { //indentsp(depth);fprintf(stdout, "%d: exnode2 %d cs=%d\n", k, nthchild(trip, k), cs); exNode (nthchild(trip, k), cs, l+h+eps, &che, &chm, depth+1, needed); graphDrawArrow (*cm, l+h, chm, l+h+eps-1); cs = che; } } /* interface for drawing */ #define lmax 200 #define cmax 200 char graph[lmax][cmax]; /* array for ASCII-Graphic */ void graphTest (int l, int c) { int ok; ok = 1; if (l < 0) ok = 0; if (l >= lmax) ok = 0; if (c < 0) ok = 0; if (c >= cmax) ok = 0; if (ok) return; printf ("\n+++error: l=%d, c=%d not in drawing rectangle 0, 0 ... %d, %d", l, c, lmax, cmax); // fprintf (stderr, "\n+++error: l=%d, c=%d not in drawing rectangle 0, 0 ... %d, %d", // l, c, lmax, cmax); { int i, j; int lmx=20, cmx=60; for (i = 0; i < lmx; i++) { for (j = cmx-1; j > 0 && graph[i][j] == ' '; j--); graph[i][cmx-1] = 0; if (j < cmx-1) graph[i][j+1] = 0; if (graph[i][j] == ' ') graph[i][j] = 0; } for (i = lmx-1; i > 0 && graph[i][0] == 0; i--); printf ("\n"); for (j = 0; j <= i; j++) printf ("\n // %s", graph[j]); printf("\n"); }; exit (1); } void graphInit (void) { int i, j; for (i = 0; i < lmax; i++) { for (j = 0; j < cmax; j++) { graph[i][j] = ' '; } } } void graphFinish() { int i, j; char *s; for (i = 0; i < lmax; i++) { for (j = cmax-1; j > 0 && graph[i][j] == ' '; j--); graph[i][cmax-1] = 0; if (j < cmax-1) graph[i][j+1] = 0; if (graph[i][j] == ' ') graph[i][j] = 0; } for (i = lmax-1; i > 0 && graph[i][0] == 0; i--); printf ("\n"); for (j = 0; j <= i; j++) { char *p; s = graph[j]; // hacks to slightly improve formatting if (j == 0) s += 2; else if (j == 1) s += 1; else if ((p=strchr(s, '|')) == NULL || p[1]=='|') s += 2; printf ("\n // %s", s); } printf("\n\n"); } void graphBox (char *s, int *w, int *h) { *w = strlen (s) + del; *h = 1; } void graphDrawBox (char *s, int c, int l) { int i; //fprintf(stdout, "c=%d strlen=%d del=%d\n", c, strlen(s), del); graphTest (l, c+strlen(s)-1+del); for (i = 0; i < strlen (s); i++) { graph[l][c+i+del] = s[i]; } } void graphDrawArrow (int c1, int l1, int c2, int l2) { int m; graphTest (l1, c1); graphTest (l2, c2); m = (l1 + l2) / 2; while (l1 != m) { graph[l1][c1] = '|'; if (l1 < l2) l1++; else l1--; } while (c1 != c2) { graph[l1][c1] = '-'; if (c1 < c2) c1++; else c1--; } while (l1 != l2) { graph[l1][c1] = '|'; if (l1 < l2) l1++; else l1--; } graph[l1][c1] = '|'; } #endif /*-----------------------------------------------------------------------*/ /*-----------------------------------------------------------------------*/ #ifdef DEBUG_TREES /* See drawtree.c for test harness. Small mods made for this interface. */ /* NOTE!!! Now we have n-ary trees, this will not work. ifthenelse breaks */ static int tree_debug = (0!=0); static int vertical = (0==0); static int horizontal = (0==0); static int wide = (0!=0); static int trim = (0==0); static int testone = (0==0); // row col long pic[256][256]; // 0..255 is char, >= 256 is ptr to string int oldtextblit(int row, int col, char *src) { // post-processing string expansion int l = 0; for (;;) { if (*src == '\0') break; if (tree_debug) fprintf(stderr, "1: Planting '%c' at [%d][%d]\n", *src, row, col); pic[row][col++] = *src++; l += 1; } return l; } int textblit(int row, int col, char *src) { // store pointer to string, unpack later on output int l = strlen(src); pic[row][col] = (int)src; return (l+(wide?3:1))>>(wide?2:1); // half size because on diagonal } void layout(int id, int idx, int rowoff, int coloff, int *depth, int *width) { char *operator; int op; int leftkid, rightkid; int leftkiddepth = 0, leftkidwidth = 0; int rightkiddepth = 0, rightkidwidth = 0; int deltadepth = 0, deltawidth = 0; int i; if (tree_debug) fprintf(stderr, ">> %d:layout(%d, rowcol[%d][%d], depth %d, width %d);\n", id, idx, rowoff, coloff, *depth, *width); if (idx == -1) return; // was NOOP, now (null) operator = nameof(idx); leftkid = leftchild(idx); rightkid = rightchild(idx); // Anchor the corner node. (void)textblit(rowoff, coloff, operator); /* not strcpy - don't copy NUL */ deltawidth = 1; if (display_rightchild[opsym(idx)]) { int len = ((strlen(nameof(leftkid))+(wide?3:1))>>(wide?2:1))+1; // text on the diagonal while (len-- > 1) {deltawidth += 1; pic[rowoff][coloff-1+deltawidth] = (vertical ? '\\' : '-');} // attach the RHS tree if (tree_debug) fprintf(stderr, "Recursing to right node %d\n", rightkid); layout(2*id, rightkid, rowoff, coloff+deltawidth, &rightkiddepth, &rightkidwidth); deltadepth = rightkiddepth; } else { deltadepth = 1; /* The op itself */ } // testing: correcting a typo if (((strlen(operator)+(wide?3:1))>>(wide?2:1)) >= deltawidth) deltawidth = ((strlen(operator)+(wide?3:1))>>(wide?2:1))+2; if (display_leftchild[opsym(idx)]) { // draw the down link // calculate extra height if ((((strlen(nameof(leftkid))+(wide?3:1))>>(wide?2:1))) > deltadepth) { deltadepth = ((strlen(nameof(leftkid))+(wide?3:1))>>(wide?2:1)); } for (i = 1; i < deltadepth+1 /* +1 for spacer row */; i++) { if (tree_debug) fprintf(stderr, "2: Planting '%c' at [%d][%d]\n", '/', rowoff+i, coloff); pic[rowoff+i][coloff] = (horizontal ? '/' : '|'); } // recurse on the LHS tree if (tree_debug) fprintf(stderr, "Recursing to left node %d\n", leftkid); layout(2*id+1, leftkid, rowoff+deltadepth+1, coloff, &leftkiddepth, &leftkidwidth); *depth = (*depth) + leftkiddepth + deltadepth + 1; } else *depth = (*depth) + deltadepth; if (rightkidwidth+deltawidth > leftkidwidth) { *width = (rightkidwidth+deltawidth); } else { *width = leftkidwidth; } if (tree_debug) fprintf(stderr, "<< %d:layout(%d, rowcol[%d][%d], depth %d, width %d);\n", id, idx, rowoff, coloff, *depth, *width); } void draw_tree_orig(int root) { int depth = 0, width = 0, row, col, offset, trimmable; fprintf(stdout, "\n"); // Init. for (col = 0; col < 256; col++) { for (row = 0; row < 256; row++) { pic[row][col] = ' '; } } /* Generate layout */ layout(1, root, 128, 0, &depth, &width); if (tree_debug) fprintf(stderr, "Dump layout: rows = %d cols = %d\n", depth, width); if (tree_debug) fflush(stderr); if (vertical) { /* apply vertical shear first */ offset = 1; for (col = 1; col < 256; col++) { // move this column down by 'offset' for (row = 255; row > offset; row--) { pic[row][col] = pic[row-offset][col]; pic[row-offset][col] = ' '; } offset += 1; } } if (horizontal) { /* apply horizontal shear next */ row = 255; // start at bottom of drawing offset = 0; for (;;) { static long temp[1024]; for (col = 0; col < 256; col++) { temp[col] = ' '; } for (col = 0; col < 256; col++) { temp[col*2+offset] = pic[row][col]; temp[col*2+offset+1] = ' '; } for (col = 0; col < 256; col++) { pic[row][col] = temp[col]; } if (row == 0) break; offset += 1; /* more shear on next row up */ row -= 1; } } if (trim) { trimmable = (0==0); for (;;) { for (row = 0; row < 256; row++) { if (pic[row][0] != ' ') { trimmable = (0!=0); break; } } if (!trimmable) break; for (row = 0; row < 256; row++) { for (col = 0; col+1 < 256; col++) { pic[row][col] = pic[row][col+1]; } pic[row][255] = ' '; } } } if (wide) { /* apply widening last */ row = 255; // start at bottom of drawing offset = 0; for (;;) { static long temp[1024]; for (col = 0; col < 256; col++) { temp[col] = ' '; } for (col = 0; col < 256; col++) { temp[col*2+offset] = pic[row][col]; temp[col*2+offset+1] = ' '; } for (col = 0; col < 256; col++) { pic[row][col] = temp[col]; } if (row == 0) break; row -= 1; } } /* display tree */ for (row = 0; row < 256; row++) { trimmable = (0 == 0); for (col = 0; col < 256; col++) { if (pic[row][col] != ' ') { trimmable = (0!=0); break; } } if (!trimmable) { fprintf(stdout, " "); // INDENT for (col = 255; col >= 0; col--) { if (pic[row][col] != ' ') break; pic[row][col] = '\0'; } printf(" // "); for (col = 0; col < 256; col++) { if ((pic[row][col] < -128) || (pic[row][col] > 255)) { oldtextblit(row, col, (char *)pic[row][col]); } else if (pic[row][col] == '\0') break; putchar(pic[row][col]); } putchar('\n'); } } putchar('\n'); fflush(stdout); return; }
#endif /* DEBUG_TREES */
« Parsing. (Table driven top-down recursive-descent parser) »
« The parser used here is based on a design by Tony Brooker which was originally used in Atlas Autocode and the "Compiler Compiler". It generates a concrete syntax tree rather than the abstract syntax tree more popular in modern compilers. A later phase converts from concrete to abstract. Note that the parsing procedure here is just a piece of code to walk a pre-built table. There is nothing in this section which reflects the grammar, if that is what you are looking for. You'll find the grammar embedded in the 'compile()' procedure in the following section. »
int cp = 0; // code pointer. Has to be global state. int ap = 0; // Analysis record pointer. int parse(int pp, int depth) // depth is only for indentation in diags { « Main parsing procedure. This is a table-driven parser, with the tables being generated from the grammar rules embedded in the 'compile' procedure below. The result of the parse is a tree structure, and the values of the nodes in the tree structure are used to drive a large 'case' statement which selects the actions to be performed after a successful parse. There is no grammatical structure embedded in this procedure. If you're looking for the grammar definition, see the procedure called 'compile' instead. »
int saved_cp, saved_ap, i, gp, alts, match; « Initialisation »
for (i = 0; i < alts; i++) { « Starting with the root phrase, recursively examine each alternative »
int each, phrases = gram[gp++], phrase_count, gap = 0; cp = saved_cp; ap = saved_ap; if (ap+3 > next_free_a) next_free_a = ap+3; makespace(A, next_free_a, a_size); A[ap++] = pp; // record which phrase (could be done outside loop) A[ap++] = i; // and which alt. // Count slots needed. *Could* be precalculated and stored // in the grammar, either embedded (after the ALT) or as a // separate table for (each = 0; each < phrases; each++) if (gram[gp+each] >= 512) gap++; A[ap++] = gap; // Count of alts (gap) // ap+gap now points to the slot after the space required, which // is where the first subphrase will be stored. ap = ap+gap; // recursive subphrases are stored after this phrase. // ap is left updated if successful. // successfully parsed phrases are stored in A[saved_ap+3+n] if (saved_ap+3+gap > next_free_a) next_free_a = saved_ap+3+gap; makespace(A, next_free_a, a_size); « Debug »
match = TRUE; // stays true if all subphrases match phrase_count = 0; // only phrases which make it into the A record, // i.e. excluding literals and keywords for (each = 0; each < phrases; each++) { « Within a single grammar rule (alternative), ensure that each subphrase is present »
int phrase = gram[gp+each]; « Debug »
if (phrase < 256) { « Literal »
} else if (phrase < 512) { « Keyword »
} else if (phrase < 512+MAX_BIP) { « Built-in phrase »
} else { « Recursive call to parser for a subphrase »
} « debug »
if (!match) break;
} gp += phrases; // move over all phrases, to next alt if (match) break; #ifdef DEBUG_PARSER else if (debug_parser) { indent(depth, stdout); fprintf(stdout, "** Alternative %d FAILED.\n", i+1); } #endif // gp now points to first element (length) of next alt, or start of next phrase
} return(match);
}
« Convert the concrete syntax tree into an abstract syntax tree. (the grammar itself is also embedded in this section) »
« This is primarily the main 'compile()' procedure, which is actually where the grammar of the language is defined. The grammar is extracted from in-line comments, and converted into a table by the 'takeon' program which you can find in the same directory as this file. (You can view the extracted grammar in file "varcalc.g") The style of compiler on which this design is based actually goes directly from concrete syntax tree to code generation - but that was from the days when memory was tight. Since most modern compilers - and especially books about them - are AST-based, we'll take that extra step here in order to give our students an AST-based compiler to experiment with. »
« Compiler tree-walking support code »
« In previous compilers, I had to write custom code for every tree-based optimisation, in order to walk down the tree to the right place to find the leaves to be optimised. In this one, I have a generic tree-walking procedure which can walk the entire program, but it can be customised so that it takes action only on specific phrases. This is possible in this design only because each set of subphrases stores the count of subphrases befoe it - thus allowing a generic tree-walking procedure that doesn't have to know what each node consists of until it happens across a node of the type it is looking for. However this only walks the concrete syntax tree - looks like we need something very similar to walk the abstract syntax tree as well... »
TRIP compile(int ap, int depth) { « Main code-generation procedure. This is called after parsing, with parameters which describe the parse tree. By jumping to the corresponding statement in the large 'case' below, we execute the actions associated with the parse-tree nodes. The grammar which was used to build the parse tables is extracted from the source below (from comments marked with "//\\") and the tables are built with the associated 'takeon' program. (See this same directory for the source. It's quite short...) »
int saved_ap; int phrase; // A[ap] is the phrase number. A[ap+1] is the alt. int alt; // For consistency, in BIPs, the Alt should always be 1 // although that may not be the case at the moment :-( int phrases; // defined subphrases int i; TRIP t1, t2, t3, t4, t5; // Temporaries // The following ecce command executed on this file will generate varcalc.g: // ecce -c "(v.//\\.s..(v/ /e)?m,k)0;%c" varcalc.c varcalc.g // May later tweak takeon.c to read from varcalc.c rather than varcalc.g // thus simplifying build process and becoming more like yacc. saved_ap = ap; phrase = A[ap++]; alt = A[ap++]; phrases = A[ap++]; #ifdef DEBUG fprintf(stdout, "compile(A[%d], %d) phrase=%s\n", saved_ap, depth, phrasename[phrase-512]); #endif switch (phrase) { « Built-in phrases »
//\\ //\\ # Phrase definitions. SSLIST is the main entry point. //\\ case P_SSLIST: //\\ //\\ P<SSLIST> = <STATEMENT> <SSLIST>, //\\ ; « (Click here to expand the code) »
//\\ P<SYNTAXERROR> = ; case P_SYNTAXERROR: fprintf(stderr, "*** Syntax error. Details later.\n"); exit(1); //\\ P<EXISTINGVAR> = <IDENT>; case P_EXISTINGVAR: t1 = compile(A[ap], depth+1); return make_binary_tuple(VAR, t1, INT); //\\ P<NEWVAR> = <IDENT>; case P_NEWVAR: t1 = compile(A[ap], depth+1); return make_binary_tuple(VAR, t1, INT); case P_STATEMENT: //\\ //\\ P<STATEMENT> = <EXISTINGVAR> ':' <SIMPLE> <OPTSEMI>, //\\ <SIMPLE> <OPTSEMI>; if (alt == 1) return compile(A[ap], depth+1); return -1; return make_binary_tuple(SEQUENCE, make_unary_tuple(LABEL, compile(A[ap], depth+1) /* tag */), compile(A[ap+1], depth+1)); case P_SIMPLE: //\\ //\\ P<SIMPLE> = <SS>, //\\ <CONSTDECL>, //\\ <VARDECL>, //\\ <DEFFN>, //\\ <ARRAY>, //\\ <EXTERN>, //\\ <PROCDEF> //\\ ; return compile(A[ap], depth+1); case P_EXTERN: //\\ P<EXTERN> = "extern" <EXISTINGVAR>; return -1; // not implemented case P_PROCDEF: //\\ P<PROCDEF> = "proc" <EXISTINGVAR> '(' <ARGLIST> ')' <OPTSEMI> <PROCBODY> "end"; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+3], depth+1); return make_nary_tuple(DEFFN, t1, t2, t3); case P_PROCBODY: //\\ P<PROCBODY> = <SS> <OPTSEMI> <PROCBODY>, //\\ ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+2], depth+1); if (t2 == -1) return t1; return make_binary_tuple(SEQUENCE, t1, t2); case P_ARGLIST: //\\ P<ARGLIST> = <EXISTINGVAR> <ARGLIST>, //\\ ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(DEFPARAM, t1, t2); case P_CONSTDECL: //\\ //\\ P<CONSTDECL> = "const" <EXISTINGVAR> '=' <NUM>; return -1; // not implemented case P_VARDECL: //\\ //\\ P<VARDECL> = "var" <VARDEC> <VARDECLIST>; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; return make_binary_tuple(SEQUENCE, t1, t2); case P_VARDEC: //\\ //\\ P<VARDEC> = <IDENT> '=' <NUM>, //\\ <IDENT>; t1 = compile(A[ap], depth+1); if (alt == 0) { t2 = compile(A[ap+1], depth+1); t3 = make_binary_tuple(VAR, t1, INT); return make_binary_tuple(ASSIGNSCALAR, t3, t2); } return -1; //make_binary_tuple(VAR, t1, INT); no. var is a *use* of a variable. we need a decl opcode that does an EQU or whatever case P_VARDECLIST: //\\ //\\ P<VARDECLIST> = ',' <VARDEC> <VARDECLIST>, //\\ ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; return make_binary_tuple(SEQUENCE, t1, t2); case P_ARRAY: //\\ //\\ P<ARRAY> = "initialised" "array" <EXISTINGVAR> '[' <NUM> ']' '=' <INITLIST> <OPTSEMI>, //\\ "array" <IDENT> '[' <NUM> ']'; if (alt == 0) return -1; // not implemented t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); // t3 = declare_array(declare_int(), t2); // return make_binary_tuple(DECLAREARRAY, t1, t2); // TODO: lots to do here - 'type' should be a structure that includes the array size case P_INITLIST: //\\ //\\ P<INITLIST> = <NUM> <INITLIST>, //\\ ; return -1; // not implemented case P_DEFFN: //\\ //\\ P<DEFFN> = "let" <EXISTINGVAR> '(' <FORMALS> ')' '=' <BOOLEXPR>; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); return make_nary_tuple(DEFFN, t1, t2, t3); case P_FORMALS: //\\ //\\ P<FORMALS> = <EXISTINGVAR> <RESTOFFORMALS>; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(DEFPARAM, t1, t2); case P_RESTOFFORMALS: //\\ //\\ P<RESTOFFORMALS> = ',' <EXISTINGVAR> <RESTOFFORMALS>, ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(DEFPARAM, t1, t2); case P_SS: //\\ //\\ P<SS> = <EXISTINGVAR> '=' <BOOLEXPR> <OPTIF>, //\\ <EXISTINGVAR> '[' <EXPR> ']' '=' <BOOLEXPR> <OPTIF>, //\\ <PROCCALL> <OPTIF>, //\\ <IFSEQ>, //\\ <LOOP>, //\\ "return" <BOOLEXPR> <OPTIF>, //\\ "return" <OPTIF>, //\\ "jump" <EXISTINGVAR> <OPTIF>; if (alt == 0) { t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = make_binary_tuple(ASSIGNSCALAR, t1, t2); t4 = compile(A[ap+2], depth+1); if (t4 != -1) { rightchild(t4) = t3; // plug the 'then' part into the IFTHEN opcode return t4; } return t3; } else if (alt == 1) { t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); t4 = make_nary_tuple(ASSIGNARRAY, t1, t2, t3); t5 = compile(A[ap+3], depth+1); if (t5 != -1) { rightchild(t5) = t4; // plug the 'then' part into the IFTHEN opcode return t5; } return t4; } else if (alt == 2) { t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; rightchild(t2) = t1; return t2; } else if ((alt == 3) || (alt == 4)) { return compile(A[ap], depth+1); } else if (alt == 5) { // error if not in a function t1 = make_unary_tuple(RETURN, compile(A[ap], depth+1)); // function result t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; rightchild(t2) = t1; return t2; } else if (alt == 6) { // error if not in a procedure t1 = make_unary_tuple(RETURN, -1); // procedure return t2 = compile(A[ap], depth+1); if (t2 == -1) return t1; rightchild(t2) = t1; return t2; } else return -1; // not implemented case P_IFSEQ: //\\ P<IFSEQ> = "if" <BOOLEXPR> <OPTSEMI> <THENPART> <OPTELSEPART> "finish"; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+2], depth+1); t3 = compile(A[ap+3], depth+1); if (t3 == -1) return make_binary_tuple(IFTHEN, t1, t2); return make_nary_tuple(IFTHENELSE, t1, t2, t3); case P_THENPART: //\\ P<THENPART> = <SS> <OPTSEMI> <THENPART>, //\\ ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+2], depth+1); if (t2 == -1) return t1; return make_binary_tuple(SEQUENCE, t1, t2); case P_OPTELSEPART: // start with the simple case //\\ P<OPTELSEPART> = "else" <THENPART>, //\\ ; if (alt == 1) return -1; return compile(A[ap], depth+1); //-- case P_OPTELSEPART: //--\\ P<OPTELSEPART> = <OPTELSEIFPART> "else" <THENPART>, //--\\ ; //-- if (alt == 1) return -1; //-- t1 = compile(A[ap], depth+1); //-- t2 = compile(A[ap+1], depth+1); //-- case P_OPTELSEIFPART: //--\\ P<OPTELSEPART> = <ELSEIFLIT> '(' <BOOLEXPR> ')' <THENPART> <OPTELSEIFPART>, //--\\ ; //-- if (alt == 1) return -1; //-- t1 = compile(A[ap], depth+1); //-- t2 = compile(A[ap+1], depth+1); //-- if (t2 == -1) return t1; //-- return make_binary_tuple(SEQUENCE, t1, t2); //\\ P<ELSEIFLIT> = "else" "if", "elseif"; case P_LOOP: //\\ P<LOOP> = "cycle" <LOOPPART> "repeat" <OPTIF>; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(REPEATIF, t1, t2); case P_LOOPPART: //\\ P<LOOPPART> = <SS> <OPTSEMI> <LOOPPART>, //\\ ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+2], depth+1); if (t2 == -1) return t1; return make_binary_tuple(SEQUENCE, t1, t2); case P_PROCCALL: //\\ //\\ P<PROCCALL> = <EXISTINGVAR> '(' <PARAMLIST> ')'; t1 = compile(A[ap], depth+1); // VAR (ident is a var; leftchild of ident is a tag; rightchild of tag is index into stringpool) t2 = compile(A[ap+1], depth+1); // expr param t3 = leftchild(t1); // TAG (fn name) t4 = rightchild(t3); // stringpool offset return make_binary_tuple(FNCALL, make_proc_name(t4), t2); case P_OPTIF: //\\ //\\ P<OPTIF> = "if" <BOOLEXPR>, //\\ ; if (alt == 1) return -1; return make_binary_tuple(IFTHEN, compile(A[ap], depth+1), -1); // 'then' part is a hole to be filled. case P_BOOLEXPR: //\\ //\\ P<BOOLEXPR> = <BOOLTERM> <RESTOFBOOLTERM>; « (Click here to expand the code) »
case P_RESTOFBOOLTERM: //\\ //\\ P<RESTOFBOOLTERM> = <OROP> <BOOLTERM> <RESTOFBOOLTERM>, //\\ ; « (Click here to expand the code) »
case P_BOOLTERM: //\\ //\\ P<BOOLTERM> = <BOOLFACTOR> <RESTOFBOOLFACTOR>; « (Click here to expand the code) »
case P_RESTOFBOOLFACTOR: //\\ //\\ P<RESTOFBOOLFACTOR> = <ANDOP> <BOOLFACTOR> <RESTOFBOOLFACTOR>, //\\ ; « (Click here to expand the code) »
case P_BOOLFACTOR: //\\ //\\ P<BOOLFACTOR> = <OPTNOT> <RELATION>; « (Click here to expand the code) »
case P_OPTNOT: //\\ //\\ P<OPTNOT> = '!', //\\ ; if (alt == 0) return mkop(NOT); return -1; case P_OPTSEMI: //\\ //\\ P<OPTSEMI> = ';', //\\ ; return -1; case P_RELATION: //\\ //\\ P<RELATION> = <EXPR> <RESTOFRELATION>; « (Click here to expand the code) »
case P_RESTOFRELATION: //\\ //\\ P<RESTOFRELATION> = <RELOP> <EXPR>, //\\ ; « (Click here to expand the code) »
case P_EXPR: //\\ //\\ P<EXPR> = <SUM> <RESTOFEXPR>; « (Click here to expand the code) »
case P_RESTOFEXPR: //\\ //\\ P<RESTOFEXPR> = <SHIFTOP> <SUM> <RESTOFEXPR>, //\\ ; « (Click here to expand the code) »
case P_SUM: //\\ //\\ P<SUM> = <OPTADDOP> <TERM> <RESTOFSUM>; « (Click here to expand the code) »
case P_UNOP: //\\ //\\ P<UNOP> = '+', '-', '\\'; return (alt == 0 ? mkop(ADD) : (alt == 1 ? mkop(SUB) : mkop(NOT))); case P_ADDOP: //\\ //\\ P<ADDOP> = '+', '-'; return (alt == 0 ? mkop(ADD) : mkop(SUB)); case P_OPTADDOP: //\\ //\\ P<OPTADDOP> = <ADDOP>, //\\ ; « (Click here to expand the code) »
case P_RESTOFSUM: //\\ //\\ P<RESTOFSUM> = <ADDOP> <TERM> <RESTOFSUM>, //\\ ; « (Click here to expand the code) »
case P_TERM: //\\ //\\ P<TERM> = <FACTOR> <RESTOFTERM>; « (Click here to expand the code) »
case P_RESTOFTERM: //\\ //\\ P<RESTOFTERM> = <MULOP> <FACTOR> <RESTOFTERM>, //\\ ; « (Click here to expand the code) »
case P_OROP: //\\ //\\ P<OROP> = '||'; return mkop(OR); case P_ANDOP: //\\ //\\ P<ANDOP> = '&&'; return mkop(AND); case P_RELOP: //\\ //\\ P<RELOP> = '<>', '<=', '<', '>=', '>', '!=', '==', '#', '='; switch (alt) { case 0: case 7: case 5: return mkop(NE); case 1: return mkop(LE); case 2: return mkop(LT); case 3: return mkop(GE); case 4: return mkop(GT); case 6: case 8: return mkop(EQ); } case P_SHIFTOP: //\\ //\\ P<SHIFTOP> = '<<', '>>'; return (alt == 0 ? mkop(LSH) : mkop(RSH)); case P_MULOP: //\\ //\\ P<MULOP> = '*', '/'; return (alt == 0 ? mkop(MUL) : mkop(DIV)); case P_FACTOR: //\\ //\\ P<FACTOR> = '(' <BOOLEXPR> ')', //\\ <EXISTINGVAR> '(' <PARAMLIST> ')', //\\ <EXISTINGVAR> '[' <EXPR> ']', //\\ <EXISTINGVAR>, //\\ <NUMBER>; if (alt == 0) { return compile(A[ap], depth+1); } else if (alt == 1) { t1 = compile(A[ap], depth+1); // VAR (ident is a var; leftchild of ident is a tag; rightchild of tag is index into stringpool) t2 = compile(A[ap+1], depth+1); // expr param t3 = leftchild(t1); // TAG (fn name) t4 = rightchild(t3); // stringpool offset return make_binary_tuple(FNCALL, make_proc_name(t4), t2); } else if (alt == 2) { // indexed array element t1 = compile(A[ap], depth+1); // VAR (ident is a var; leftchild of ident is a tag; rightchild of tag is index into stringpool) t2 = compile(A[ap+1], depth+1); return make_binary_tuple(INDEX, t1, t2); } else if (alt == 3) { return compile(A[ap], depth+1); } else { // alt = 4 return compile(A[ap], depth+1); } case P_PARAMLIST: //\\ //\\ P<PARAMLIST> = <EXPR> <RESTOFPARAMLIST>, //\\ ; //\\ if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(PARAM, t1, t2); case P_RESTOFPARAMLIST: //\\ //\\ P<RESTOFPARAMLIST> = ',' <EXPR> <RESTOFPARAMLIST>, //\\ ; //\\ if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(PARAM, t1, t2); case P_NUMBER: //\\ //\\ P<NUMBER> = <NUM>, //\\ <UNOP> <NUM>, //\\ <CHARLIT>; return compile(A[ap], depth+1); //\\ //\\ E //\\ # 'E' is end of grammar. Everything after this is ignored. default: fprintf(stderr, "*** Internal error at line %d. ap=%d phrase=%d", __LINE__, ap, phrase); if (((phrase-512) >= 0) && ((phrase-512) < MAX_PHRASE)) fprintf(stderr, "\n (possible missing \"case P_%s:\" in compile()?)", phrasename[phrase-512]); fprintf(stderr, "\n"); t5=t5/0; exit(2); } return(-1); // DUMMY TRIP, NOTHING TO RETURN
}
« Code generators - one for three-address, one for a stack machine, and one run-time interpreter. At the moment, function definitions and calls have only been added to the stack-based machine »
static int nextlab = 0; « basic codegen_three_address() generates a simple 3-address intermediate code »
// Codegen is the guts of the compiler, which effectively serialises // the AST into sequentially executable statements. A further phase // is required to actually generate executable code. « Support I/O procs for codegen »
void declare(char *name) { fprintf(stdout, " int %s;\n", name); } void loadconst(char *format, TRIP dest, int value) { fprintf(stdout, " "); fprintf(stdout, format, dest, value); } void loadvar(char *format, TRIP dest, char *varname) { fprintf(stdout, " "); fprintf(stdout, format, dest, varname); } void store(char *format, char *varname, TRIP source) { fprintf(stdout, " "); fprintf(stdout, format, varname, source); } void operate(char *format, TRIP dest, TRIP leftop, char *op, TRIP rightop) { fprintf(stdout, " "); fprintf(stdout, format, dest, leftop, op, rightop); } void monoperate(char *format, TRIP dest, char *op, TRIP leftop) { fprintf(stdout, " "); fprintf(stdout, format, dest, op, leftop); } void put_goto(int lab) { fprintf(stdout, " "); fprintf(stdout, "goto L%02d;\n", lab); } void put_ifgoto(int cond, int lab, int sense) { fprintf(stdout, " "); fprintf(stdout, "if (%s_t%d) goto L%02d;\n", (sense ? "" : "!"), cond, lab); } void input(TRIP i) { /* TACC BUG! CANNOT ESCAPE double quotes properly */ fprintf(stdout, " "); fprintf(stdout, "fprintf(stderr, %c%s: %c); fflush(stderr); fscanf(stdin, %c%%d%c, &%s);\n", '"', nameof(leftchild(i)), '"', '"', '"', nameof(leftchild(i))); } void print(TRIP i) { fprintf(stdout, " "); fprintf(stdout, "fprintf(stdout, %c%%d\\n%c, _t%d); fflush(stdout);\n", '"', '"', leftchild(i)); } // this macro was used in debugging, not really needed now. #define put_label(n) xput_label(n, __LINE__) void xput_label(int lab, int line) { fprintf(stdout, "L%02d:\n", lab); }
void codegen_three_address(TRIP root) { if (root == -1) return; switch (opsym(root)) { case NOOP: break; case CONST: loadconst("_t%d = %d;\n", root, rightchild(root)); break; case DECLARESCALAR: // TODO: modify for different decl types root = leftchild(root); declare(stringpool+rightchild(leftchild(root))); break; case VAR: loadvar("_t%d = %s;\n", root, nameof(root)); break; case ASSIGNSCALAR: codegen_three_address(rightchild(root)); store("%s = _t%d;\n", nameof(leftchild(root)), rightchild(root)); break; case PARAM: codegen_three_address(leftchild(root)); codegen_three_address(rightchild(root)); break; case SEQUENCE: codegen_three_address(leftchild(root)); codegen_three_address(rightchild(root)); break; case INPUT: input(root); break; case PRINT: codegen_three_address(leftchild(root)); print(root); break; case NEG: case NOT: codegen_three_address(leftchild(root)); monoperate("_t%d = %s_t%d;\n", root, c_infix_op[opsym(root)], leftchild(root)); break; default: /* Be careful not to default anything other than binary operators! */ if (arity[opsym(root)] != 3) { fprintf(stdout, "*** Not Implemented: codegen_three_address(%s)\n", name[opsym(root)]); break; } codegen_three_address(leftchild(root)); codegen_three_address(rightchild(root)); operate("_t%d = (_t%d %s _t%d);\n", root, leftchild(root), c_infix_op[opsym(root)], rightchild(root)); break; } }
« codegen_stack generates for a stack machine »
void codegen_stack(TRIP root) { static char *current_function_name=""; // temp hack. real soln involves a scope/block-stack static int code_started = FALSE; if (root == -1) return; if ((opsym(root) == ASSIGNSCALAR) && !code_started) { // add all top-level opcodes // first actual code (we skip over function definitions) // BROKEN! doesn't work now we have procedure bodies // fprintf(stdout, "\n"); // fprintf(stdout, "_start: ; Main program entry\n"); code_started = TRUE; } switch (opsym(root)) { case CONST: fprintf(stdout, " PUSH #%d\n", rightchild(root)); break; case VAR: fprintf(stdout, " PUSH %s\n", nameof(root)); break; case ASSIGNSCALAR: codegen_stack(rightchild(root)); fprintf(stdout, " POP %s\n", nameof(leftchild(root))); break; case ASSIGNARRAY: codegen_stack(nthchild(root, 3)); fprintf(stdout, " PUSH &%s\n", nameof(leftchild(root))); // push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TODO) // might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high> codegen_stack(rightchild(root)); // array index fprintf(stdout, " ADD ; add offset to base (remember, word-addressed)\n"); fprintf(stdout, " STORE ; *pop() = pop()\n"); break; case LABEL: fprintf(stdout, "L_%s:\n", nameof(leftchild(root))); break; case DECLAREARRAY: break; case DEFFN: current_function_name = stringpool+rightchild(leftchild(leftchild(root))); fprintf(stdout, "%s:\n", current_function_name); // first we use the paramlist in rightchild to push a temporary definition // of each parameter on the namespace stack // push_temporary_parameter_definitions(rightchild(root)); // (and also generate code to pop params from stack to locals) codegen_stack(rightchild(root)); // then we compile code for the definition, which will pick up local parameters // for those idents rather than any globals of the same name // first pop the params off the data stack and assign to locals. // later implementation won't use static locals but will pick up off stack directly codegen_stack(nthchild(root, 3)); // restore_temporary_parameter_definitions(rightchild(root)); // TODO: don't output RET if last statement was 'return' fprintf(stdout, " RET\n"); // NOTE WE'RE ONLY HANDLING ONE IMPLICIT RESULT BY DEFAULT break; case DEFPARAM: // small tweak to ensure parameters popped in reverse order to undo pushes to stack codegen_stack(rightchild(root)); if (opsym(leftchild(root)) == DEFPARAM) codegen_stack(leftchild(root)); else { int formal = rightchild(leftchild(leftchild(root))); int local; static char locals[128]; sprintf(locals, "%s_%s", current_function_name, stringpool+formal); if (nextstring + strlen(locals) + 1 >= MAXPOOL) exit(1); // TODO: add message strcpy(stringpool+nextstring, locals); /* Create a backstop for when not found */ local = str_to_pool(locals); if (local != nextstring) { fprintf(stderr, "ERROR: local parameter '%s' already exists.\n", locals); } else nextstring += strlen(locals)+1; /* Not found, add it */ rightchild(leftchild(leftchild(root))) = local; // replace for the context of this fn def // the parameter at leftchild(root) is a "VAR" - currently we can cheat // and dive into the string directly, because our model of params (and variables) // is so dumb. However when we start using types properly this will have to change. fprintf(stdout, " POP %s\n", locals); } break; case PARAM: codegen_stack(leftchild(root)); codegen_stack(rightchild(root)); break; case RETURN: codegen_stack(leftchild(root)); // result is just a <BOOLEXPR>, or -1... fprintf(stdout, " RET\n"); break; case SEQUENCE: codegen_stack(leftchild(root)); codegen_stack(rightchild(root)); break; case IFTHEN: { static int nextlab = 1000; // for a proper branch, need to look at root node here... codegen_stack(leftchild(root)); // push the condition on the stack (True or False) fprintf(stdout, " BF F_%d\n", ++nextlab); codegen_stack(rightchild(root)); fprintf(stdout, "F_%d:\n", nextlab); } break; case IFTHENELSE: { static int nextlab = 1000; // for a proper branch, need to look at root node here... codegen_stack(leftchild(root)); // push the condition on the stack (True or False) fprintf(stdout, " BF T_%d\n", ++nextlab); codegen_stack(rightchild(root)); fprintf(stdout, " B E_%d\n", nextlab); fprintf(stdout, "T_%d:\n", nextlab); codegen_stack(nthchild(root, 3)); fprintf(stdout, "E_%d:\n", nextlab); } break; case REPEATIF: { static int nextlab = 1000; // for a proper branch, need to look at root node here... fprintf(stdout, "B_%d:\n", ++nextlab); codegen_stack(leftchild(root)); if (rightchild(root) != -1) { codegen_stack(leftchild(rightchild(root))); fprintf(stdout, " BT B_%d\n", nextlab); } else { fprintf(stdout, " B B_%d\n", nextlab); } } break; case FNCALL: codegen_stack(rightchild(root)); // zero or more params fprintf(stdout, " CALL %s\n", stringpool+rightchild(leftchild(root))); break; case INDEX: fprintf(stdout, " PUSH &%s\n", nameof(leftchild(root))); // push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TODO) // might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high> codegen_stack(rightchild(root)); // array index fprintf(stdout, " ADD ; add offset to base (remember, word-addressed)\n"); fprintf(stdout, " FETCH ; Push(*Pop())\n"); break; case NEG: case NOT: codegen_stack(leftchild(root)); fprintf(stdout, " %s\n", name[opsym(root)]); break; default: /* Be careful not to default anything other than binary operators! */ if (arity[opsym(root)] != 3) { fprintf(stdout, "*** Not Implemented: codegen_stack(%s)\n", name[opsym(root)]); break; } codegen_stack(leftchild(root)); codegen_stack(rightchild(root)); fprintf(stdout, " %s\n", name[opsym(root)]); break; } }
« Run-time interpreter: Cheap & nasty code execution directly from the AST! »
// Trivial run-time stack implementation: static int stack[128]; static int stackp = -1; void Push(int val) { stack[++stackp] = val; } int Pop(void) { return stack[stackp--]; } « (a little hack) »
void execute_AST(TRIP root) { TRIP left, right; if (root == -1) return; switch (opsym(root)) { case NOOP: break; case CONST: Push(rightchild(root)); break; case VAR: Push(variable_contents(root)); break; case ASSIGNSCALAR: // Until I add a "print" command to this language, we can see the results of // computations by a simple hack, which is to print out the value of any assignments. execute_AST(rightchild(root)); variable_contents(leftchild(root)) = Pop(); fprintf(stdout, " %s = %d\n", nameof(leftchild(root)), variable_contents(leftchild(root))); break; case SEQUENCE: execute_AST(leftchild(root)); execute_AST(rightchild(root)); break; case NEG: execute_AST(leftchild(root)); Push(-Pop()); break; case NOT: execute_AST(leftchild(root)); Push(!Pop()); // Boolean NOT, not bitwise NOT break; // AND, OR, case AND: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left && right); break; case OR: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left || right); break; // ADD, SUB, MUL, DIV, case ADD: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left + right); break; case SUB: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left - right); break; case MUL: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left * right); break; case DIV: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); if (right == 0) { fprintf(stderr, "Run-time error: divide by zero\n"); exit(1); } Push((int)(left / right)); break; // LSH, RSH, EXP, case LSH: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left << right); break; case RSH: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left >> right); break; case EXP: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); //Push(iexp(left, right)); {int temp = left; while (right >= 2) { left = left * temp; right -= 1; } } Push(left); break; // EQ, NE, LT, GT, LE, GE, case EQ: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left == right); break; case NE: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left != right); break; case LT: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left < right); break; case GT: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left > right); break; case LE: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left <= right); break; case GE: execute_AST(leftchild(root)); execute_AST(rightchild(root)); right = Pop(); left = Pop(); Push(left >= right); break; default: // INTERNAL ERROR! fprintf(stdout, "*** Not Implemented: execute_AST(%s)\n", name[opsym(root)]); exit(1); break; } //fprintf(stdout, "[%d] ", stack[stackp]); //printtrip(root); // look at the op we just executed }
int main(int argc, char **argv) { char *s; #ifdef DEBUG // GDB backtrace facility! // extern void restart_under_gdb(int argc, char **argv); // restart_under_gdb(argc, argv); #endif « Handle program arguments »
« Lexical scan »
« Call the parser »
« Generate code »
exit(0); return(1); }