//current area being worked on: search for zxcv /*< THIS IS A FORK OF THE CALCULATOR, WHICH WILL COMPILE RAINER'S "LC" LANGUAGE */ /*> Note: code forces a divide-by-zero error in order to allow point of error to be trapped by gdb, thus allowing a backtrace. To do: see codegen_c entry for DEFFN - tweak to determine int or void Walk_AST() - one pass for main body, another for procs, third for initialised decls, fourth for uninit decls create a codegen_src() procedure to re-output source fix occassional ASTcodes that pun on other ops. print reconstituted source at sequence points, rather than trying to print actual source. cst would have been better than AST however. can i link concrete syntax array (a) to AST (a) ???? (is a[] reused per statement? don't think so) REGRESSION TEST 45 FAILING! works when a label inserted... relatively soon: interpreter first, 3-address next. Replace AST with linearised code. use something like Robertson 3rd pass technique of blocks of irreducable code to be stitched together between things like branches ret, call etc. instead of outputting decls, procs, and code all interspersed, have a tree-walk scan the code for each type of data and output only that type so that we can not only plant the __main__: label properly, but also output procs after the code, and output data to an appropriate psect. code changed to force decl order at parse time fix local decls in procs. array decls, consts. fold <number> (-num) ADD SOURCE CODE TO AST PROPERLY. Perhaps hook it in at <SS> level? Add in all lines not previously added, for lineno = thisline? <-------- working on this now. error recovery currently broken. 'bestparse' syntax error pointer precedence for bitwise & and |. Duplicate full C exprs? array params? (dope vectors) Need to ensure that the language supports '//' comments Seemed to hang indefinitely when given a badly formed const: '\' Lesson roadmap: minimise stack depth by rotating symmetric nodes, leads to minimising number of registers... (same algorithm, reuse code) lesson: static stack depth equates to R<n> Add the no. of cells in a trip to the trip itself (immediately after the opcode - adjust the macros accordingly...) so that things like function definitions and function calls are not linked lists. the trip-debugging code also needs to be adjusted to match. have the opcode field point to an opcode trip? similar level of flattening for ops at the same precedence level? (why, apart from it looks nicer as a tree in graphviz????) do a proper symbol table and generate a tree describing a declaration. generalise getvar and use 'var's where appropriate, clean up current messy code where we extract the string replace the lexer with the simpler one from the precedence demo? CFG: SERIALISE THE AST! Needed for loops/conditionals/gotos. how much of the non-flow-control can be left as a tree??? Do we need a 'link' field in the AST (or should I use a separate data structure altogether? There's no real need to...) Is this merely a case of flattening the tree by using the 'SEQUENCE' operator? How about just adding a label field to a sequence op? Is the LHS always code and the RHS always a link? Serialised AST *requires* REDIR nodes. Probably a lot of link scanning, since we can't easily compact code or index into it when making optimisation changes. to add: '&', distinguish between BOOL and LOG AND, if/then/else while/for/until/break signal??? (error handling?) procedures/functions multiple result syntax? but not until the recent messes caused by adding simple functions have been cleaned up. Don't let the code deteriorate *and* bloat at the same time. Add I/O (put/get char from stdin/stdout, and keyboard/screen) arrays, structs, pointers??? Can we keep this a 'safe' language? strings as indexes into string pool. range checked. no absolute ram addresses ever. (Note: interpreter does *not* require serialisation! Can't let Tim get away with taking this shortcut!) See rainer's "LC" - are we missing any features it needs? Can we get away with a version that uses 32-bit ints and no chars/words? (maybe justify it by using UTF8 for our character set?) do we canonicalise all ast items so that a recursive compile() does the right thing, vs having to look down particular branches with a different procedure because the same node might have two different meanings in different contexts? I.e. an AST cleanup phase to make the actual code generation phase simpler. (& more drop-through cases) (first example that springs to mind - VARs in param list) maybe some times same code *needs* to be interpreted twice in different ways, eg one pass over params to add to symbol table and declare local storage, and then a second to pop from stack at entry to proc. conflating code by drop-through in switch statements. should DEFFN just be another DECLARE object with a type of FN? Add handler to generate code in different psects so that we can effectively switch output streams to, say, write procedures at the end of the file to avoid having to jump over them... (fix the trailing '\' bug in ctohtml/cfold! - or have taken read the C source, and warn if \ followed by a non-blank non-comment line?) extract procs to smaller files - plug-in swap of modules as new features added. Need a #include pre-processor (to regenerate single- file format for folded display)? (handle "..." includes only, not <>'s) (move cfold to windows... update unix copy with portability fixes before updating with function-call code.) generate HTML listing files and embed graphviz images. remove ascii tree stuff (which is rather big) Revisit having return address on data stack? How awkward would that be? Tree traversal to detect recursion? How expensive? (PDP15 soln) Wasn't even thinking about optimisations yet! (SSA etc) [what was the thing call that recombined a loop? Y-node? Combinator?] However... 'clean' functions allow compile-time constant folding of function calls. Is there a way to generalise this with the current interpreter rather than the usual methods of constant folding? Maybe passing back something like 'NAN' which causes fast exit from procs/loops etc whenever encountered???? how about "a = b + badfn() * c + d", where c is known to be 0. NAN scheme would cause early return and miss opportunity to optimise. inline replacement using AST is more powerful than C's macros because you can obey scoping rules and make it identical to a real fn call. (Reminiscent of lambda functions/functional language programming) almost forgot the old 'tail recursion' hack. is there any way we can generalise it and get even more cases? Question: do a version that duplicates Rainer's LC so that we can compile his port of ecce? Or edit the ecce port to match the new language (esp. Tim's version of the language, as a test piece...) Code generation - external calls via DLLs. Calling conventions, register conventions. Use of two conventions - one for internal calling, another for external callers that follows system standards. Simple stack machine - is there a macro assembler for windows that will let us compile using the current output??? If not should we write one? :-) Re earlier comment of psects - generate procedures to a serialised AST, allows us to output the code in arbitrary order (see Robertson's talk at Edinburgh, anecdote paper on Ackerman's) i.e. can we do the same tricks as Robertson's pass three by juggling AST nodes at some appropriate stage in the conversion from AST to CFG to code? */ // You are reading a C source file displayed using 'folds'. Click on one of the // sections below to unfold its contents. Reclicking, or clicking on the expanded // text, will collapse the expanded text back to the summary. // If you are interested in the software used here (rather than the actual program, // which is a demonstration compiler for the Yahoo 'compilers101' group) then // you can read about it at my blog: // // http://techennui.blogspot.com/2008/04/semi-literate-programming.html /*< Include files, consts, incidental globals */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <errno.h> #include <stdarg.h> #ifndef FALSE #define FALSE (0!=0) #define TRUE (0==0) #endif #define DEBUG TRUE /* simple programming debug features, remove when compiling optimised */ #define DEBUG_TRIPS_CODE TRUE #define DEBUG_TRIPS_AFTER TRUE #define DEBUG_TREES TRUE char *progname; /*>*/ /*< Parser support */ #define DEBUG_PARSER #include "varcalc.h" // GENERATED GRAMMAR FROM takeon.c and varcalc.g int debug_parser = FALSE; // Built-in phrase codes. Must match with grammar file. #define TYPE_EOF 0 #define TYPE_TAG 1 #define TYPE_STRING 2 #define TYPE_CHARCONST 3 #define TYPE_CHAR 4 #define TYPE_INT 5 #define TYPE_HEXINT 6 #define TYPE_KEYWORD 7 int bestparse = -1; // for error reporting. char *looking_for = "<UNKNOWN>"; // 'while looking for <PHRASENAME>' (or literal) ... /*>*/ /*< Data structures */ // We use a stringpool, and strings are indexes into this pool. This // is useful for the same reasons that the AST is an array indexed // by integers rather than a struct with pointers. It may also // save space by reusing common strings. And we get a free tag // to describe strings. (Also we can compare strings just with // an integer tag comparison, if we ever want to) // not everything uses this yet. search for malloc and realloc and strdup and free to find potential problems #define MAXPOOL (1024*32) char stringpool[MAXPOOL]; int nextstring = 0; int str_to_pool(char *s) { int tag; for (tag = 0; tag <= nextstring; tag++) { if (strcmp(stringpool+tag, s) == 0) { return tag; /* found, one way or another */ } } } /*< C[] is the source character token stream */ typedef struct sourceinfo { // ATOMS for processed input stream char *s; // string contents int l; // lineno int col; // column int t; // type - tag, "string", 'charconst', or char, so far char *f; // source or includefile name } sourceinfo; static sourceinfo *c = NULL; int nextfree = 0, arraysize = 0; char *onecharstr; /*>*/ /*< A[] is the Analysis record. Contents point to a C[] structure when the item is a BIP, otherwise the format is [phraseno] [altno] [no-of-subphrases] [subphrases and/or BIPs...] for example, if A[25] contained 10, that would mean that the token for A[25] was stored in C[10]. (And the C string would be at &stringpool[c[A[25]]]?) */ static int *A = NULL; /* Flex array, expanded using 'makespace' below */ int next_free_a = 0, a_size = 0; /*>*/ /* variables used by line-reconstruction (lexer) */ FILE *sourcefile; char *curfile; int startline = TRUE, whitespace = TRUE, lineno = 1, col = 0, ch, peek; /*>*/ /*< Support procedures */ /*< Incidental I/O support */ #define ishex(ch) (int)strchr("0123456789ABCDEFabcdef", ch) int hextobin(char x) { // Too crude, just a quick hack. Seldom called. if (('0' <= x) && (x <= '9')) { return x-'0'; } else if (('A' <= x) && (x <= 'F')) { return x-'A'+10; } else if (('a' <= x) && (x <= 'f')) { return x-'a'+10; } else { return 0; // ASSERT: INTERNAL ERROR! } } void indent(int depth, FILE *f) { int i; for (i = 0; i < depth; i++) fprintf(f, " "); } /*>*/ /*< Support proc for storage management */ /* I'm reasonably sure that I have a bug in this procedure. Compiling ecce crashes if MINSIZE is 1024. By greatly increasing the initial allocation we avoid whatever the problem is. MUST BE FIXED. TODO Should check null returns from calloc/realloc - may be as simple as running out of space. Possibly a problem with lack of contiguous blocks available - worst case, no block can be reused and we end up with N^2/2 space used instead of N. GOT IT! *arraysize = MINSIZE should be *arraysize = MINSIZE-1 */ static void *makespace_(void *c, int nextfree, int *arraysize, int objsize) { #define MINSIZE (1024*16) if ((c == NULL) || (*arraysize == 0) || (nextfree == 0)) { // TOO MANY TESTS! Need to pick the right one! c = calloc(MINSIZE, objsize); *arraysize = MINSIZE-1; } else if (nextfree >= *arraysize) { *arraysize = (*arraysize * 2) + 1; c = (void *)realloc(c, (*arraysize+1) * objsize); // 0:arraysize, inclusive. eg 0:15, 0:127 etc // this was causing a crash on the second call, on Windows/TCC. c was NULL on first call. } if (c == NULL) {fprintf(stderr, "makespace: %s\n", strerror(errno)); exit(errno);} return c; } #define makespace(c, nextfree, arraysize) c = (typeof(c))makespace_(c, nextfree, &arraysize, sizeof(c[0])) /*>*/ /*< Support procs for storing lexical units */ void stores(char *s, int lineno, int col, int type, char *fname) { int tag; if (nextstring + strlen(s) + 1 >= MAXPOOL) exit(1); // TODO: add message strcpy(stringpool+nextstring, s); /* Create a backstop for when not found */ tag = str_to_pool(s); if (tag == nextstring) nextstring += strlen(s)+1; /* Not found, add it */ makespace(c, nextfree, arraysize); c[nextfree].s = stringpool+tag; c[nextfree].l = lineno; c[nextfree].col = col; c[nextfree].f = fname; c[nextfree].t = type; nextfree++; } void storec(int ch, int lineno, int col, int type, char *fname) { onecharstr[ch*2] = ch; onecharstr[ch*2+1] = '\0'; // convert char to 1-char string before saving. stores(&onecharstr[ch*2], lineno, col, type, fname); } /*>*/ /*< simple proc to recognise if a token is a keyword */ int iskeyword(char *s) { int i; for (i = 0; i < MAX_KEYWORD; i++) if (strcmp(s, keyword[i]) == 0) return TRUE; return FALSE; } /*>*//*>*/ /*< Main procedures - the parser and the code generator (which embodies the grammar) */ /*< Line reconstruction, which for this language equates to lexing */ static int xfgetc(FILE *f); static void xungetc(int c, FILE *f); void line_reconstruction(void) { /*< Pre-process input ready for parsing. Tokens are stored in array C[] */ for (;;) { ch = xfgetc(sourcefile); if (ch == EOF) break; ch &= 255; // int, positive. peek = xfgetc(sourcefile); xungetc(peek, sourcefile); if (isalpha(ch)) { /*< token or keyword */ int nextfree = 0, strsize = 0, startcol = col; char *token = NULL; whitespace = FALSE; for (;;) { makespace(token, nextfree, strsize); if (isalpha(ch) || isdigit(ch) || (ch == '_')) { // digits and '_' allowed after 1st char. col++; token[nextfree++] = ch; } else { token[nextfree] = '\0'; xungetc(ch, sourcefile); break; } ch = xfgetc(sourcefile); } stores(token, lineno, startcol, iskeyword(token) ? TYPE_KEYWORD : TYPE_TAG, curfile); free(token); /*>*/ } else if (isdigit(ch)) { /*< Number */ int nextfree = 0, numsize = 0; char *number = NULL; // Store as a string... whitespace = FALSE; for (;;) { makespace(number, nextfree, numsize); if (isdigit(ch)) { col++; number[nextfree++] = ch; } else { number[nextfree] = '\0'; xungetc(ch, sourcefile); break; } ch = xfgetc(sourcefile); } stores(number, lineno, col, TYPE_INT, curfile); free(number); /*>*/ } else switch (ch) { case '$': /*< Hex constant \$[0-9a-fA-F]+ */ // Q: store the '$' in the string or not? whitespace = FALSE; col++; if (ishex(peek)) { int nextfree = 0, numsize = 0; char *number = NULL; for (;;) { makespace(number, nextfree, numsize); ch = xfgetc(sourcefile); if (ishex(ch)) { col++; number[nextfree++] = ch; } else { number[nextfree] = '\0'; xungetc(ch, sourcefile); break; } } stores(number, lineno, col, TYPE_HEXINT, curfile); free(number); } else { // Warn: probably an error... should not be any naked '$' symbols. // If the error to be prined would have been a generic syntax // error at the same location, then maybe give a more informative // error message such as "Unexpected character '$' near: ..." // On the other hand the generic mechanism probably reports this // almsost as accurately. storec(ch, lineno, col++, TYPE_CHAR, curfile); } /*>*/ break; case '\'': // Handle 'c' character constants case '"': // Handle "string" /*< literals */ { int nextfree = 0, strsize = 0, quotech = ch; char *string = NULL; whitespace = FALSE; col++; for (;;) { ch = xfgetc(sourcefile); // Newlines are allowed col++; makespace(string, nextfree, strsize); if (ch == '\\') { ch = xfgetc(sourcefile); col++; if (ch == '\\') { string[nextfree++] = ch; } else if (ch == '\'') { string[nextfree++] = '\''; } else if (ch == '"') { string[nextfree++] = '"'; } else if (ch == 'n') { string[nextfree++] = '\n'; } else if (ch == 'r') { string[nextfree++] = '\r'; } else if (ch == 't') { string[nextfree++] = '\t'; } else if (ch == 'x') { int x, x1, x2; x1 = xfgetc(sourcefile); col++; if (!ishex(x1)) { // WARN: Bad format continue; } x2 = xfgetc(sourcefile); col++; if (!ishex(x2)) { // WARN: Bad format continue; } x = (hextobin(x1)<<4) | hextobin(x2); if (x == 0) { // WARN: embedded NUL in a string is asking for trouble... } string[nextfree++] = x; } else { // Warn of unknown (to me) \x escape. Probably an error. string[nextfree++] = '\\'; string[nextfree++] = ch; } } else if (ch != quotech) { string[nextfree++] = ch; } else { string[nextfree] = '\0'; break; } } if (quotech == '\'') { if (strlen(string) == 1) { } else if (strlen(string) <= 4) { // Warn that 'xx' as a 32-bit int is a non-standard extension } else { // Warn that this is probably a string with the wrong type of quote. } } stores(string, lineno, col, (quotech == '\'' ? TYPE_CHARCONST : TYPE_STRING), curfile); free(string); } break; /*>*/ case '/': /*< COMMENTS (or just a divide symbol) */ col++; whitespace = FALSE; if (peek == '/') { // Handle line comment do {ch = xfgetc(sourcefile);} while (ch != '\n'); lineno++; col = 0; whitespace = TRUE; } else if (peek == '*') { /* Handle potential multi-line comment */ ch = xfgetc(sourcefile); // Now we have read '/*' for (;;) { col++; ch = xfgetc(sourcefile); peek = xfgetc(sourcefile); if ((ch == '*') && (peek == '/')) break; xungetc(peek, sourcefile); } col += 2; (void)xfgetc(sourcefile); // Remove '/' // QUESTION: How does this affect # directives? } else { storec(ch, lineno, col, TYPE_CHAR, curfile); } break; /*>*/ // WHITESPACE case '\n': lineno++; case '\r': startline = TRUE; col = 0; whitespace = TRUE; break; case '\t': case ' ': col++; // Does not affect whitespace break; // DIRECTIVES case '#': // If we interpret any #-directives while lexing, we don't want to // do an expensive test on every token, so what we can do is set // a countdown timer on the introductory token (either this '#' // or the actual keyword such as 'ifdef') and then test that the // *previous* tokens match when the timer hits 0, eg // C[cp-3] == '#' && C[cp-2] == 'include' ... etc if (!whitespace) { // WARN: probably an error... should not be any '#' symbols in the // middle of a line. (This language uses "!=" or "<>" for not-equal) } // Drop through default: whitespace = FALSE; storec(ch, lineno, col++, TYPE_CHAR, curfile); } } // set up a dummy at the end because we sometimes look ahead by 1 // in the parsing code and don't want to hit uninitialised data. c[nextfree].t = TYPE_EOF; c[nextfree].s = "<EOF>"; c[nextfree].l = lineno; c[nextfree].col = col; /*>*/ } /*>*/ /*< Abstract Syntax Tree data structures */ typedef int TRIP; // A 'trip' was originally a 'triple' of <opcode, operand, operand> // Now we use n-ary tuples but retain the name for nostalgia's sake... #define MAXTRIPS (1024*100) int AST[MAXTRIPS]; /* Should use flex arrays here too... */ int nexttrip = 0; #ifdef DEBUG int checkast(TRIP idx, int lineno) { if (idx < 0) { fprintf(stderr, "Run-time error at line %d: negative index AST[%d] is not valid!\n", lineno, idx); idx = idx/0; } else if (idx >= MAXTRIPS) { fprintf(stderr, "Run-time error at line %d: AST[%d] is out of range (max %d)!\n", lineno, idx, MAXTRIPS); idx = idx/0; } else { return idx; } } #else #define checkast(x,l) (x) #endif #define opsym(root) AST[checkast(root,__LINE__)] #define leftchild(root) AST[checkast((root)+1,__LINE__)] #define rightchild(root) AST[checkast((root)+2,__LINE__)] #define nthchild(root, n) AST[checkast((root)+n,__LINE__)] /*< Several tables describing the AST opcodes. Some are only relevant to diagnostic output. */ // the first 3 tables are used by the compiler: typedef enum opcode { /*< names of AST operators */ ERROR, NOOP, // we're now *only* using NOOP as a placeholder when swapping out DEFFN etc... TEMP HACK SEQUENCE, DEFFN, DEFPARAM, PARAM, CONST, TAG, LABEL, OPERATOR, ASSIGNSCALAR, ASSIGNARRAY, DECLARESCALAR, DECLAREARRAY, VAR, IFTHEN, IFTHENELSE, INPUT, PRINT, FNCALL, RETURN, BAND, BOR, ADD, SUB, MUL, DIV, LSH, RSH, LAND, LOR, EXP, NEG, NOT, PAREN, EQ, NE, LT, GT, LE, GE, ASS, INDEX, INT, REAL, WHILE, LOOP, REPEATIF, BREAK, CONTINUE, GOTO, REDIRECT, LINEAR_CODE, LINE, LINENO, // ---------- anything below this line is only for use in serialise_AST B, BT, BF, //BEQ, BNE, BLT, BGT, BLE, BGE, MAX_OPCODE /*>*/ } OPCODE; int prio[] = { /*< priority level of infix operators */ /* Ordered according to http://www.difranco.net/cop2220/op-prec.htm unary neg, not 200 exp 100 ? * / MUL DIV 90 + - ADD SUB 80 << >> RSH LSH 70 < <= > >= LT LE GT GE 60 == != EQ NEQ 50 & BAND 40 | BOR 30 && LOGAND 20 || LOGOR 10 */ /*ERROR*/999, /*NOOP*/999, /*SEQUENCE*/999, /*DEFFN*/999, /*DEFPARAM*/999, /*PARAM*/999, /*CONST*/999, /*TAG*/999, /*LABEL*/999, /*OPERATOR*/999, /*ASSIGNSCALAR*/999, /*ASSIGNARRAY*/999, /*DECLARESCALAR*/999, /*DECLAREARRAY*/999, /*VAR*/999, /*IFTHEN*/999, /*IFTHENELSE*/999, /*INPUT*/999, /*PRINT*/999, /*FNCALL*/999, /*RETURN*/999, /*BAND*/40, /*BOR*/30, /*ADD*/80, /*SUB*/80, /*MUL*/90, /*DIV*/90, /*LSH*/70, /*RSH*/70, /*LAND*/20, /*LOR*/10, /*EXP*/100, /*NEG*/200, /*NOT*/200, /*PAREN*/999, /*EQ*/50, /*NE*/50, /*LT*/60, /*GT*/60, /*LE*/60, /*GE*/60, /*ASS*/1, /*INDEX - probably does need a priority... */999, /*INT*/999, /*REAL*/999, /*WHILE*/999, /*LOOP*/999, /*REPEATIF*/999, /*BREAK*/999, /*CONTINUE*/999, /*GOTO*/999, /*REDIRECT*/999, /*LINEAR_CODE*/999, /*LINE*/999, /*LINENO*/999, // ---------- anything below this line is only for use in serialise_AST /*B*/999, /*BT*/999, /*BF*/999, //999/*BEQ*/, 999/*BNE*/, 999/*BLT*/, 999/*BGT*/, 999/*BLE*/, 999/*BGE*/, /*>*/ }; int arity[] = { /*< Number of operands in the tuple for this opcode. Most are triples. */ 1, 1, 3, 4 /* DEFFN */, 3 /* DEFPARAM*/, 3, 3, 3, 2 /* LABEL */, 3, 3, 4 /*ASSIGNARRAY*/, 3 /* DECLARESCALAR */, 4 /* DECLAREARRAY */, 3, 3, 4, 2, 2, 3 /* FNCALL */, /* Variable! Store in the struct? */ 2 /* RETURN */, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, /* NEG */ 2, /* NOT */ 2, /* PAREN */2, 3, 3, 3, 3, 3, 3, 3, 3 /*INDEX*/, 3, 3, 3, /*LOOP - wrong?*/3, 3, 3, 3, 2 /*GOTO*/, 3, /*LINEAR_CODE*/2, 3, 2, /*B*/2, /*BT*/2, /*BF*/2, //2/*BEQ*/, 2/*BNE*/, 2/*BLT*/, 2/*BGT*/, 2/*BLE*/, 2/*BGE*/, /*>*/ }; // name was intended only for debugging, but in this initial implementation // it is also being used as the source of mnemonics for the code generator. // In real life, the machine opcodes would _not_ map 1:1 with AST operators. char *name[] = { /*< ascii representation of AST operator names for debugging */ "ERROR", "NOOP", "SEQUENCE", "DEFFN", "DEFPARAM", "PARAM", "CONST", "TAG", "LABEL", "OPERATOR", "ASSIGNSCALAR", "ASSIGNARRAY", "DECLARESCALAR", "DECLAREARRAY", "VAR", "IFTHEN", "IFTHENELSE", "INPUT", "PRINT", "FNCALL", "RETURN", "BAND", "BOR", "ADD", "SUB", "MUL", "DIV", "LSH", "RSH", "LAND", "LOR", "EXP", "NEG", "NOT", "PAREN", "CMPEQ", "CMPNE", "CMPLT", "CMPGT", "CMPLE", "CMPGE", "ASS", "INDEX", "INT", "REAL", "WHILE", "LOOP", "REPEATIF", "BREAK", "CONTINUE", "GOTO", "REDIRECT", "LINEAR_CODE", "LINE", "LINENO", "B", "BT", "BF", //"BEQ", "BNE", "BLT", "BGT", "BLE", "BGE", /*>*/ }; // the following tables are only used for debugging: char *shortname[] = { /*< ascii representation of AST operators for display when drawing trees */ "ERROR", "%)" /* %) */, "';'", "DEFFN", "()" /* def (% */, "','", "CONST", "TAG", "LABEL", "OPERATOR", "=", "[]=", "DECLARESCALAR", "array", "var", "'if (...) then'", "'if (...) then ... else ...'", "INPUT", "PRINT", "(%" /* (% */, "=>", "'&&'", "'||'", "'+'", "'-'", "'*'", "'/'", "'<<'", "'>>'", "'&'", "'|'", "'^'", "'-'", "'~'", "'()'", "'=='", "'!='", "'<'", "'>'", "'<='", "'>='", "'='", "'[]'", "INT", "REAL", "WHILE", "LOOP", "REPEATIF", "BREAK", "CONTINUE", "GOTO", "REDIRECT", "LINEAR_CODE", "LINE", "LINENO", "B", "BT", "BF", //"BEQ", "BNE", "BLT", "BGT", "BLE", "BGE", /*>*/ }; char *c_infix_op[] = { /*< ascii representation of infix operators for display. Non-infix ops have dummy values. */ "ERROR", "NOOP", ";", "DEFFN", "DEFPARAM", ",", "CONST", "TAG", "LABEL", "OPERATOR", "=", "[]=", "DECLARESCALAR", "DECLAREARRAY", "VAR", "if (...) then", "if (...) then ... else ...", "INPUT", "PRINT", "FNCALL", "RETURN", "&&", "||", "+", "-", "*", "/", "<<", ">>", "&", "|", "^", "-", "~", "()", "==", "!=", "<", ">", "<=", ">=", "=", "[]", "INT", "REAL", "WHILE", "LOOP", "REPEATIF", "BREAK", "CONTINUE", "GOTO", "REDIRECT", "LINEAR_CODE", "LINE", "LINENO", "B", "BT", "BF", //"BEQ", "BNE", "BLT", "BGT", "BLE", "BGE", /*>*/ }; int display_children[] = { /*< In tree-drawing code, how many children do we draw for this node? Not always the same as the arity. */ 0, 0, 2, 3 /* DEFFN */, 2 /* DEFPARAM */, 2, 0, 0, 1, 0, 2, 3 /* ASSIGNSCALAR */, 1 /* DECLARESCALAR - name, type, initval */, 2 /* DECLAREARRAY - name, type, bounds initvals */, 0, 2, 3, 1, 1, 2 /* FNCALL */, /* Variable no of params */ 1 /* RETURN */, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 0, /*WHILE*/1, /*LOOP*/0, /*REPEATIF*/2, 0, 0, 1 /*GOTO*/, /*REDIRECT*/0, /*LINEAR_CODE*/1, 0, /*LINENO - suppressed for now - was causing crash in Walk_AST*/0, /*B*/1, /*BT*/1, /*BF*/1, //1/*BEQ*/, 1/*BNE*/, 1/*BLT*/, 1/*BGT*/, 1/*BLE*/, 1/*BGE*/, /*>*/ }; int display_leftchild[] = { /*< Do we display the left child of the node? */ FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE /* ASSIGNSCALAR */, TRUE /*DECLARESCALAR*/, TRUE /*DECLAREARRAY*/, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE /* FNCALL */, TRUE /* RETURN */, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, /*B*/TRUE, /*BT*/TRUE, /*BF*/TRUE, //TRUE/*BEQ*/, TRUE/*BNE*/, TRUE/*BLT*/, TRUE/*BGT*/, TRUE/*BLE*/, TRUE/*BGE*/, /*>*/ }; int display_rightchild[] = { /*< Do we display the right child of the node? */ FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE /* ASSIGNSCALAR */, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE /* FNCALL */, /* Variable no of args >= 0 */ FALSE /* RETURN */, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, /*B*/FALSE, /*BT*/FALSE, /*BF*/FALSE, //FALSE/*BEQ*/, FALSE/*BNE*/, FALSE/*BLT*/, FALSE/*BGT*/, FALSE/*BLE*/, FALSE/*BGE*/, /*>*/ }; /*>*/ static char xline[1024]; static int xi = 0, xcur_line = 1; static void xungetc(int c, FILE *f) { if (xi > 0) { xi -= 1; } else if (c == '\n') { xcur_line -= 1; xi = strlen(xline); } ungetc(c, f); } static int xfgetc(FILE *f) { int c, ch; c = fgetc(f); if (c == EOF) return EOF; ch = c&255; if (ch == '\n') { xline[xi] = '\0'; xi = 0; (void)make_binary_tuple(LINE, xcur_line++, (int)strdup(xline)); } else xline[xi++] = ch; if (xi == 1023) xi = 1022; xline[xi] = '\0'; return c; } char *nameof(TRIP t) { /*< Extract the name of a variable or the value of a const. Stored in stringpool. Space is never reclaimed, it just grows linearly. Still, has to be better than using malloc and all the overhead/fragmentation that goes with it. */ char *result = stringpool+nextstring; int poolptr, op; if (nextstring + 128 >= MAXPOOL) exit(1); // TODO: add error message if (t == -1) { sprintf(result, "(null)"); } else { op = opsym(t); if (op == CONST) { sprintf(result, "%d", rightchild(t)); } else if (op == VAR) { // for simpler diagrams, skip a level sprintf(result, "%s", stringpool+rightchild(leftchild(t))); // punt to tag } else if (op == TAG) { // for simpler diagrams, skip a level sprintf(result, "%s", stringpool+rightchild(t)); } else { sprintf(result, "%s", shortname[op]); } } poolptr = str_to_pool(result); if (poolptr == nextstring) nextstring += strlen(stringpool+poolptr)+1; /* Not found, add it */ return(stringpool+poolptr); /*>*/ } #ifdef DEBUG int checkop(int idx, char *caller) { if (idx < 0) { fprintf(stderr, "Run-time error: negative index arity[%d] is not valid (in %s)!\n", idx, caller); idx = idx/0; } else if (idx >= MAX_OPCODE) { fprintf(stderr, "Run-time error: opsym(%d) is out of range (max %d)! (in %s)\n", idx, (int)MAX_OPCODE, caller); idx = idx/0; } else { return idx; } } #else #define checkop(i,c) (i) #endif #define tripsize(i) arity[checkop((int)opsym(i),"tripsize")] void printtrip(TRIP i); // note we're taking a slight shortcut here by storing the opcode directly in the // first field. If fact a cleaner design might have been to make that field point to // an 'opcode' trip, which is a unary operator whose sole parameter is the opcode. // this would require one extra indirection at the point of access, which could // be hidden in a procedure call anyway. TRIP make_unary_tuple(OPCODE op, TRIP parm1) { /*< Create a tuple for a unary operator */ int trip = nexttrip; #ifdef DEBUG //fprintf(stderr, "1:nexttrip <- %d\n", nexttrip); #endif opsym(trip) = op; nexttrip += tripsize(trip); leftchild(trip) = parm1; #ifdef DEBUG_TRIPS_DURING printtrip(trip); #endif return trip; /*>*/ } TRIP make_binary_tuple(OPCODE op, TRIP parm1, TRIP parm2) { /*< Create a tuple for a binary operator */ int trip = nexttrip; opsym(trip) = op; nexttrip += tripsize(trip); #ifdef DEBUG //fprintf(stderr, "2:nexttrip <- %d\n", nexttrip); #endif leftchild(trip) = parm1; rightchild(trip) = parm2; #ifdef DEBUG_TRIPS_DURING printtrip(trip); #endif return trip; /*>*/ } TRIP mkop(OPCODE op) { return make_binary_tuple(OPERATOR, op, -1); // rightchild could be ptr to string? } static int latest_line = -1; TRIP sequence(TRIP parm1, TRIP parm2) { TRIP t1,t2,t3; t1 = make_unary_tuple(LINENO, latest_line); t2 = make_binary_tuple(SEQUENCE, parm1, parm2); t3 = make_binary_tuple(SEQUENCE, t1, t2); return t3; } void showline(int line) { int i, l; l = c[0].l; for (i = 0; i < nextfree; i++) { //if (c[i].l != l) {if (c[i].l == line) fprintf(stdout, "\n %4d: ", c[i].l); l = c[i].l;} if (c[i].t == TYPE_CHARCONST) { if (c[i].l == line) fprintf(stdout, "'%s' ", c[i].s); } else { if (c[i].l == line) fprintf(stdout, "%s ", c[i].s); } } // fprintf(stdout, "\n\n"); } TRIP make_nary_tuple(OPCODE op, TRIP parm1, ...) { /*< Create a tuple for an n-ary operator. Uses stdargs for arbitrary no. of params */ // we use this for if/then/else, and procedure calls with parameter lists. int trip; int parm; va_list ptr; #ifdef DEBUG //fprintf(stderr, "n:nexttrip <- %d\n", nexttrip); #endif trip = nexttrip++; opsym(trip) = op; leftchild(trip) = parm1; va_start(ptr, parm1); for (parm = 2; parm <= arity[op]; parm++) { nthchild(trip, parm) = va_arg(ptr, TRIP); nexttrip += 1; } va_end(ptr); #ifdef DEBUG_TRIPS_DURING printtrip(trip); #endif return trip; /*>*/ } TRIP make_proc_name(int pooloffset) { return make_binary_tuple(TAG, 0, pooloffset); } TRIP make_int_const(int datatype, char *value) { return make_binary_tuple(CONST, datatype, (int)atol(value)); } TRIP make_real_const(int datatype, char *value) { return make_binary_tuple(CONST, datatype, (int)atof(value) /* Hacky. Should fix this. */); } TRIP getvar_from_tag(TRIP tag) { int i; TRIP trip; for (i = 0; i < nexttrip; i++) { trip = leftchild(i); if ((opsym(i) == VAR) && (rightchild(i) == INT) && (opsym(trip) == TAG) && (rightchild(trip) == tag) ) return i; } } TRIP getvar(char *s) { /* tag must exist */ /*< Look up a tag in the string pool. */ /* getvar needs to be more complex. Currently it just maps from a string to the triple for a pre-declared var, adding any unrecognised string as an int. Should really add it as an error type, and handle scope rules */ int i, trip, tag; // TODO: use str_to_pool instead strcpy(stringpool+nextstring, s); /* Create a backstop in case not found */ for (tag = 0; tag <= nextstring; tag++) { if (strcmp(stringpool+tag, s) == 0) { break; /* found, one way or another */ } } if (tag == nextstring) { /* Not found, auto-declare it */ nextstring += strlen(s)+1; fprintf(stderr, "getvar_1: cannot find declaration for '%s'! - autodeclaring as 'INT'\n", s); trip = make_binary_tuple(TAG, 0, tag); return make_binary_tuple(VAR, trip, INT); } // unfortunately after the recent restructuring to use the string pool // for everything, we may now have the string in the pool already even // though we haven't created the corresponding VAR /* having located the stringpool entry, now find the appropriate declaration whose tag is using it */ for (i = 0; i < nexttrip; i++) { trip = leftchild(i); if ((opsym(i) == VAR) && (rightchild(i) == INT) && (opsym(trip) == TAG) && (rightchild(trip) == tag) ) return i; } fprintf(stderr, "getvar_2: cannot find declaration for '%s'! - autodeclaring as 'INT'\n", s); trip = make_binary_tuple(TAG, 0, tag); return make_binary_tuple(VAR, trip, INT); /*>*/ } TRIP newtag(char *s) { /* tag must *not* exist */ /*< Create a new tag and add it to the stringpool. */ int i, trip, tag; if (nextstring + strlen(s) + 1 >= MAXPOOL) exit(1); // TODO: add message strcpy(stringpool+nextstring, s); /* Create a backstop for when not found */ tag = str_to_pool(s); if (tag != nextstring) { /////////////////////////////////////////////////////////////////// fprintf(stderr, "ERROR: name '%s' already exists.\n", s); TEMP OUT /* forget about scope rules for now */ } else nextstring += strlen(s)+1; /* Not found, add it */ /*>*/ return make_binary_tuple(TAG, 0, tag); } TRIP new_or_existingtag(char *s) { /* tag must *not* exist */ /*< Create a new tag and add it to the stringpool. */ int i, trip, tag; if (nextstring + strlen(s) + 1 >= MAXPOOL) exit(1); // TODO: add message strcpy(stringpool+nextstring, s); /* Create a backstop for when not found */ tag = str_to_pool(s); if (tag != nextstring) { return make_binary_tuple(TAG, 0, tag); } else nextstring += strlen(s)+1; /* Not found, add it */ /*>*/ return make_binary_tuple(TAG, 0, tag); } /*>*/ /*< Debugging */ #if defined(DEBUG_TRIPS_AFTER) || defined(DEBUG_TRIPS_DURING) || defined(DEBUG_TRIPS_CODE) /*< Diagnostic procedure to print a triple. */ /*< This section really isn't very interesting. It's not structured for folding, and it's not relevant to the compiler algorithms. You can comfortably skip this part of the code... */ /* this is ONLY used for diagnostics and is the same in all test harness programs */ void printtrip(TRIP i) { int op, parm1, parm2; if (i < 0) { fprintf(stdout, "** Too small: TRIP %d\n", i); (void)*(int *)0; } if (i >= MAXTRIPS) { fprintf(stdout, "** Too large: TRIP %d\n", i); (void)*(int *)0; } op = opsym(i); parm1 = leftchild(i); parm2 = rightchild(i); switch (opsym(i)) { case LINENO: fprintf(stdout, "// %d: LINENO %d\n", i, parm1); break; case DECLARESCALAR: fprintf(stdout, "// %d: %s [@AST %d] ; TODO - modify for different types\n", i, name[op], parm1); break; case VAR: fprintf(stdout, "// %d: %s %s[@AST %d] TYPE=%s\n", i, name[op], stringpool+rightchild(parm1), parm1, name[parm2]); break; case TAG: /* parm1 not used for the moment */ fprintf(stdout, "// %d: %s %s[@Stringpool %d]\n", i, name[op], stringpool+parm2, parm2); break; case DEFFN: fprintf(stdout, "// %d: %s \"%s\" [arglist @AST %d] [fnbody @AST %d]\n", i, name[op], stringpool+rightchild(leftchild(parm1)), parm2, nthchild(i, 3)); break; case CONST: fprintf(stdout, "// %d: %s %s %d\n", i, name[op], name[parm1], parm2); break; case OPERATOR: fprintf(stdout, "// %d: %s %s (Orphaned AST cell)\n", i, name[op], name[parm1]); break; case ASSIGNSCALAR: fprintf(stdout, "// %d: %s [declared @AST %d] [value @AST %d]\n", i, name[op], parm1, parm2); break; case IFTHENELSE: fprintf(stdout, "// %d: %s [condition @AST %d] [then-statement @AST %d] [else-statements @AST %d]\n", i, name[op], parm1, parm2, nthchild(i, 3)); break; case IFTHEN: fprintf(stdout, "// %d: %s [condition @AST %d] [then-statement @AST %d]\n", i, name[op], parm1, parm2); break; case LINE: fprintf(stdout, "// %d: %s %d: %s\n", i, name[op], parm1, (char *)parm2); break; case BAND: case BOR: case ADD: case SUB: case MUL: case DIV: case LSH: case RSH: case EXP: fprintf(stdout, "// %d: %s [@AST %d] [@AST %d]\n", i, name[op], parm1, parm2); break; case EQ: case NE: case LT: case GT: case LE: case GE: fprintf(stdout, "// %d: [@AST %d] %s [@AST %d]\n", i, parm1, name[op], parm2); break; case NEG: case NOT: fprintf(stdout, "// %d: %s [@AST %d]\n", i, name[op], parm1); break; default: if (arity[opsym(i)] == 3) { fprintf(stdout, "// %d: %s %d %d\n", i, name[op], parm1, parm2); } else if (arity[opsym(i)] == 2) { fprintf(stdout, "// %d: %s [@AST %d]\n", i, name[op], parm1); } else { fprintf(stdout, "// %d: %s ...? (%d ops)\n", i, name[op], arity[opsym(i)]); } break; } } /*>*//*>*/ #endif #ifdef DEBUG_TREES /*< Ascii-art debugging procedure for drawing trees */ /*< OK, look, you really don't want to expand this section. It's not structured for folding, and it's not relevant to the compiler algorithms. Trust me, you can skip this... (and did I mention it was huge and monolithic???) */ int del = /*was 1*/4; /* distance of graph columns */ int eps = 3; /* distance of graph lines */ /* interface for drawing (can be replaced by "real" graphic using GD or other) */ void graphInit (void); void graphFinish(); void graphBox (char *s, int *w, int *h); void graphDrawBox (char *s, int c, int l); void graphDrawArrow (int c1, int l1, int c2, int l2); /* recursive drawing of the syntax tree */ void exNode (int trip, int c, int l, int *ce, int *cm, int depth, int *needed); /*****************************************************************************/ /* main entry point of the manipulation of the syntax tree */ /* draw_tree is taken from the yacc/lex demo by Thomas Niemann at http://epaperpress.com/lexandyacc/ - I prefer my own drawing code whenever possible (the trees look prettier) but my code does not handle n-tuples where n > 2, so I fall back to this code if a diagram requires such a tuple. */ void draw_tree_orig(int root); void dottree(int root) { char *operator; TRIP leftkid, rightkid; int linkno = 0; operator = nameof(root); leftkid = leftchild(root); rightkid = rightchild(root); fprintf(stdout, "\"node%d\" [\n label = \"<f0> %d: %s ", root, root, operator); if (display_leftchild[opsym(root)]) fprintf(stdout, "| <f%0d> %d: ", ++linkno, root+1); if (display_rightchild[opsym(root)]) fprintf(stdout, "| <f%0d> %d: ", ++linkno, root+2); linkno = 0; fprintf(stdout, "\"\n shape = \"record\"\n];\n\n"); if (display_leftchild[opsym(root)]) dottree(leftkid); if (display_rightchild[opsym(root)]) dottree(rightkid); if (display_leftchild[opsym(root)]) fprintf(stdout, "\"node%0d\":f1 -> \"node%0d\":f0 [\n id = %d\n];\n\n", root, leftchild(root), 888); if (display_rightchild[opsym(root)]) fprintf(stdout, "\"node%0d\":f2 -> \"node%0d\":f0 [\n id = %d\n];\n\n", root, rightchild(root), 999); } void draw_tree(int trip) { int rte, rtm, needed; fprintf(stdout, "\nTree for AST[%d]:\n", trip); graphInit (); needed = FALSE; exNode (trip, 0, 0, &rte, &rtm, 0, &needed); if (needed) { graphFinish(); } else { draw_tree_orig(trip); } #ifdef DOT_SUPPORT fprintf(stdout, "digraph g {\n"); fprintf(stdout, "graph [\n"); fprintf(stdout, " rankdir = \"LR\"\n"); fprintf(stdout, "];\n\n"); fprintf(stdout, "node [\n"); fprintf(stdout, " fontsize = \"16\"\n"); fprintf(stdout, " shape = \"ellipse\"\n"); fprintf(stdout, "];\n\n"); fprintf(stdout, "edge [\n"); fprintf(stdout, "];\n\n"); dottree(trip); fprintf(stdout, "}\n"); #endif } /*c----cm---ce----> drawing of leaf-nodes l leaf-info */ /*c---------------cm--------------ce----> drawing of non-leaf-nodes l node-info * | * ------------- ...---- * | | | * v v v * child1 child2 ... child-n * che che che *cs cs cs cs * */ void indentsp(int d) { int i; for (i = 0; i < d*4; i++) { putchar(' '); } } void exNode ( int trip, int c, int l, /* start column and line of node */ int *ce, int *cm, /* resulting end column and mid of node */ int depth, int *needed ) { int op; int w, h; /* node width and height */ char *s; /* node text */ int cbar; /* "real" start column of node (centred above subnodes) */ int k; /* child number */ int che, chm; /* end column and mid of children */ int cs; /* start column of children */ char word[40]; //indentsp(depth);fprintf(stdout, "start: TRIP=%d startcol=%d startline=%d\n", trip, c, l); if (trip == -1) return; op = opsym(trip); if (display_children[op] >= 3) *needed = TRUE; s = nameof(trip); sprintf(word, "%s", s); s = word; //indentsp(depth);fprintf(stdout, "graphbox: s = %s\n", s); /* construct node text box */ graphBox (s, &w, &h); cbar = c; //indentsp(depth);fprintf(stdout, "assign: c=%d\n", c); *ce = c + w; *cm = c + w / 2; /* node is leaf */ if ( (op == CONST || op == VAR || display_children[op] == 0) ) { //indentsp(depth);fprintf(stdout, "drawbox: s = %s cbar=%d\n", s, cbar); graphDrawBox (s, cbar, l); return; } /* node has children */ cs = c; //indentsp(depth);fprintf(stdout, "node has %d children: cs=c=%d\n", display_children[op], c); for (k = 1; k <= display_children[op]; k++) { //indentsp(depth);fprintf(stdout, "%d: exnode1 %d cs=%d\n", k, nthchild(trip, k), cs); exNode (nthchild(trip, k), cs, l+h+eps, &che, &chm, depth+1, needed); cs = che; } /* total node width */ if (w < che - c) { cbar += (che - c - w) / 2; *ce = che; *cm = (c + che) / 2; } /* draw node */ //indentsp(depth);fprintf(stdout, "cbar=%d\n", cbar); graphDrawBox (s, cbar, l); /* draw arrows (not optimal: children are drawn a second time) */ cs = c; for (k = 1; k <= display_children[op]; k++) { //indentsp(depth);fprintf(stdout, "%d: exnode2 %d cs=%d\n", k, nthchild(trip, k), cs); exNode (nthchild(trip, k), cs, l+h+eps, &che, &chm, depth+1, needed); graphDrawArrow (*cm, l+h, chm, l+h+eps-1); cs = che; } } /* interface for drawing */ #define lmax 2000 #define cmax 2000 char graph[lmax][cmax]; /* array for ASCII-Graphic */ void graphTest (int l, int c) { int ok; ok = 1; if (l < 0) ok = 0; if (l >= lmax) ok = 0; if (c < 0) ok = 0; if (c >= cmax) ok = 0; if (ok) return; printf ("\n+++error: l=%d, c=%d not in drawing rectangle 0, 0 ... %d, %d", l, c, lmax, cmax); // fprintf (stderr, "\n+++error: l=%d, c=%d not in drawing rectangle 0, 0 ... %d, %d", // l, c, lmax, cmax); { int i, j; int lmx=20, cmx=60; for (i = 0; i < lmx; i++) { for (j = cmx-1; j > 0 && graph[i][j] == ' '; j--); graph[i][cmx-1] = 0; if (j < cmx-1) graph[i][j+1] = 0; if (graph[i][j] == ' ') graph[i][j] = 0; } for (i = lmx-1; i > 0 && graph[i][0] == 0; i--); printf ("\n"); for (j = 0; j <= i; j++) printf ("\n // %s", graph[j]); printf("\n"); }; exit (1); } void graphInit (void) { int i, j; for (i = 0; i < lmax; i++) { for (j = 0; j < cmax; j++) { graph[i][j] = ' '; } } } void graphFinish() { int i, j; char *s; for (i = 0; i < lmax; i++) { for (j = cmax-1; j > 0 && graph[i][j] == ' '; j--); graph[i][cmax-1] = 0; if (j < cmax-1) graph[i][j+1] = 0; if (graph[i][j] == ' ') graph[i][j] = 0; } for (i = lmax-1; i > 0 && graph[i][0] == 0; i--); printf ("\n"); for (j = 0; j <= i; j++) { char *p; s = graph[j]; // hacks to slightly improve formatting if (j == 0) s += 2; else if (j == 1) s += 1; else if ((p=strchr(s, '|')) == NULL || p[1]=='|') s += 2; printf ("\n // %s", s); } printf("\n\n"); } void graphBox (char *s, int *w, int *h) { *w = strlen (s) + del; *h = 1; } void graphDrawBox (char *s, int c, int l) { int i; //fprintf(stdout, "c=%d strlen=%d del=%d\n", c, strlen(s), del); graphTest (l, c+strlen(s)-1+del); for (i = 0; i < strlen (s); i++) { graph[l][c+i+del] = s[i]; } } void graphDrawArrow (int c1, int l1, int c2, int l2) { int m; graphTest (l1, c1); graphTest (l2, c2); m = (l1 + l2) / 2; while (l1 != m) { graph[l1][c1] = '|'; if (l1 < l2) l1++; else l1--; } while (c1 != c2) { graph[l1][c1] = '-'; if (c1 < c2) c1++; else c1--; } while (l1 != l2) { graph[l1][c1] = '|'; if (l1 < l2) l1++; else l1--; } graph[l1][c1] = '|'; } #endif /*-----------------------------------------------------------------------*/ /*-----------------------------------------------------------------------*/ #ifdef DEBUG_TREES /* See drawtree.c for test harness. Small mods made for this interface. */ /* NOTE!!! Now we have n-ary trees, this will not work. ifthenelse breaks */ static int tree_debug = (0!=0); static int vertical = (0==0); static int horizontal = (0==0); static int wide = (0!=0); static int trim = (0==0); static int testone = (0==0); // row col long pic[256][256]; // 0..255 is char, >= 256 is ptr to string int oldtextblit(int row, int col, char *src) { // post-processing string expansion int l = 0; for (;;) { if (*src == '\0') break; if (tree_debug) fprintf(stderr, "1: Planting '%c' at [%d][%d]\n", *src, row, col); pic[row][col++] = *src++; l += 1; } return l; } int textblit(int row, int col, char *src) { // store pointer to string, unpack later on output int l = strlen(src); pic[row][col] = (int)src; return (l+(wide?3:1))>>(wide?2:1); // half size because on diagonal } void layout(int id, int idx, int rowoff, int coloff, int *depth, int *width) { char *operator; int op; int leftkid, rightkid; int leftkiddepth = 0, leftkidwidth = 0; int rightkiddepth = 0, rightkidwidth = 0; int deltadepth = 0, deltawidth = 0; int i; if (tree_debug) fprintf(stderr, ">> %d:layout(%d, rowcol[%d][%d], depth %d, width %d);\n", id, idx, rowoff, coloff, *depth, *width); if (idx == -1) return; // was NOOP, now (null) operator = nameof(idx); leftkid = leftchild(idx); rightkid = rightchild(idx); // Anchor the corner node. (void)textblit(rowoff, coloff, operator); /* not strcpy - don't copy NUL */ deltawidth = 1; if (display_rightchild[opsym(idx)]) { int len = ((strlen(nameof(leftkid))+(wide?3:1))>>(wide?2:1))+1; // text on the diagonal while (len-- > 1) {deltawidth += 1; pic[rowoff][coloff-1+deltawidth] = (vertical ? '\\' : '-');} // attach the RHS tree if (tree_debug) fprintf(stderr, "Recursing to right node %d\n", rightkid); layout(2*id, rightkid, rowoff, coloff+deltawidth, &rightkiddepth, &rightkidwidth); deltadepth = rightkiddepth; } else { deltadepth = 1; /* The op itself */ } // testing: correcting a typo if (((strlen(operator)+(wide?3:1))>>(wide?2:1)) >= deltawidth) deltawidth = ((strlen(operator)+(wide?3:1))>>(wide?2:1))+2; if (display_leftchild[opsym(idx)]) { // draw the down link // calculate extra height if ((((strlen(nameof(leftkid))+(wide?3:1))>>(wide?2:1))) > deltadepth) { deltadepth = ((strlen(nameof(leftkid))+(wide?3:1))>>(wide?2:1)); } for (i = 1; i < deltadepth+1 /* +1 for spacer row */; i++) { if (tree_debug) fprintf(stderr, "2: Planting '%c' at [%d][%d]\n", '/', rowoff+i, coloff); pic[rowoff+i][coloff] = (horizontal ? '/' : '|'); } // recurse on the LHS tree if (tree_debug) fprintf(stderr, "Recursing to left node %d\n", leftkid); layout(2*id+1, leftkid, rowoff+deltadepth+1, coloff, &leftkiddepth, &leftkidwidth); *depth = (*depth) + leftkiddepth + deltadepth + 1; } else *depth = (*depth) + deltadepth; if (rightkidwidth+deltawidth > leftkidwidth) { *width = (rightkidwidth+deltawidth); } else { *width = leftkidwidth; } if (tree_debug) fprintf(stderr, "<< %d:layout(%d, rowcol[%d][%d], depth %d, width %d);\n", id, idx, rowoff, coloff, *depth, *width); } void draw_tree_orig(int root) { int depth = 0, width = 0, row, col, offset, trimmable; fprintf(stdout, "\n"); // Init. for (col = 0; col < 256; col++) { for (row = 0; row < 256; row++) { pic[row][col] = ' '; } } /* Generate layout */ layout(1, root, 128, 0, &depth, &width); if (tree_debug) fprintf(stderr, "Dump layout: rows = %d cols = %d\n", depth, width); if (tree_debug) fflush(stderr); if (vertical) { /* apply vertical shear first */ offset = 1; for (col = 1; col < 256; col++) { // move this column down by 'offset' for (row = 255; row > offset; row--) { pic[row][col] = pic[row-offset][col]; pic[row-offset][col] = ' '; } offset += 1; } } if (horizontal) { /* apply horizontal shear next */ row = 255; // start at bottom of drawing offset = 0; for (;;) { static long temp[1024]; for (col = 0; col < 256; col++) { temp[col] = ' '; } for (col = 0; col < 256; col++) { temp[col*2+offset] = pic[row][col]; temp[col*2+offset+1] = ' '; } for (col = 0; col < 256; col++) { pic[row][col] = temp[col]; } if (row == 0) break; offset += 1; /* more shear on next row up */ row -= 1; } } if (trim) { trimmable = (0==0); for (;;) { for (row = 0; row < 256; row++) { if (pic[row][0] != ' ') { trimmable = (0!=0); break; } } if (!trimmable) break; for (row = 0; row < 256; row++) { for (col = 0; col+1 < 256; col++) { pic[row][col] = pic[row][col+1]; } pic[row][255] = ' '; } } } if (wide) { /* apply widening last */ row = 255; // start at bottom of drawing offset = 0; for (;;) { static long temp[1024]; for (col = 0; col < 256; col++) { temp[col] = ' '; } for (col = 0; col < 256; col++) { temp[col*2+offset] = pic[row][col]; temp[col*2+offset+1] = ' '; } for (col = 0; col < 256; col++) { pic[row][col] = temp[col]; } if (row == 0) break; row -= 1; } } /* display tree */ for (row = 0; row < 256; row++) { trimmable = (0 == 0); for (col = 0; col < 256; col++) { if (pic[row][col] != ' ') { trimmable = (0!=0); break; } } if (!trimmable) { fprintf(stdout, " "); // INDENT for (col = 255; col >= 0; col--) { if (pic[row][col] != ' ') break; pic[row][col] = '\0'; } printf(" // "); for (col = 0; col < 256; col++) { if ((pic[row][col] < -128) || (pic[row][col] > 255)) { oldtextblit(row, col, (char *)pic[row][col]); } else if (pic[row][col] == '\0') break; putchar(pic[row][col]); } putchar('\n'); } } putchar('\n'); fflush(stdout); return; } /*>*//*>*/ #endif /* DEBUG_TREES */ /*>*/ /*< Parsing. (Table driven top-down recursive-descent parser) */ /*< The parser used here is based on a design by Tony Brooker which was originally used in Atlas Autocode and the "Compiler Compiler". It generates a concrete syntax tree rather than the abstract syntax tree more popular in modern compilers. A later phase converts from concrete to abstract. Note that the parsing procedure here is just a piece of code to walk a pre-built table. There is nothing in this section which reflects the grammar, if that is what you are looking for. You'll find the grammar embedded in the 'compile()' procedure in the following section. */ int cp = 0; // code pointer. Has to be global state. int ap = 0; // Analysis record pointer. int parse(int pp, int depth) // depth is only for indentation in diags { /*< Main parsing procedure. This is a table-driven parser, with the tables being generated from the grammar rules embedded in the 'compile' procedure below. The result of the parse is a tree structure, and the values of the nodes in the tree structure are used to drive a large 'case' statement which selects the actions to be performed after a successful parse. There is no grammatical structure embedded in this procedure. If you're looking for the grammar definition, see the procedure called 'compile' instead. */ int saved_cp, saved_ap, i, gp, alts, match; char saved_desc[256]; /*< Initialisation */ gp = phrase_start[pp-512-MAX_BIP]; alts = gram[gp]; /*< Debugging */ #ifdef DEBUG_PARSER if (debug_parser) { fprintf(stdout, "\n"); indent(depth, stdout); fprintf(stdout, "Phrase %s/%d (%d alternatives) = ", phrasename[pp-512], pp, alts); fflush(stdout); } #endif /*>*/ gp++; // gp now points to first element (length) of first alt saved_cp = cp; saved_ap = ap; /*>*/ for (i = 0; i < alts; i++) { /*< Starting with the root phrase, recursively examine each alternative */ int each, phrases = gram[gp++], phrase_count, gap = 0; cp = saved_cp; ap = saved_ap; if (ap+3 > next_free_a) next_free_a = ap+3; makespace(A, next_free_a, a_size); A[ap++] = pp; // record which phrase (could be done outside loop) A[ap++] = i; // and which alt. // Count slots needed. *Could* be precalculated and stored // in the grammar, either embedded (after the ALT) or as a // separate table for (each = 0; each < phrases; each++) if (gram[gp+each] >= 512) gap++; A[ap++] = gap; // Count of alts (gap) // ap+gap now points to the slot after the space required, which // is where the first subphrase will be stored. ap = ap+gap; // recursive subphrases are stored after this phrase. // ap is left updated if successful. // successfully parsed phrases are stored in A[saved_ap+3+n] if (saved_ap+3+gap > next_free_a) next_free_a = saved_ap+3+gap; makespace(A, next_free_a, a_size); /*< Debug */ // this loop is only for diagnostics #ifdef DEBUG_PARSER if (debug_parser) { char *saved_descp; fprintf(stdout, "\n"); indent(depth, stdout); fprintf(stdout, "Alternative %d: (%d phrases) ", i+1, phrases); saved_descp = saved_desc; *saved_descp = '\0'; for (each = 0; each < phrases; each++) { int phrase = gram[gp+each]; if (phrase < 256) { saved_descp += sprintf(saved_descp, " '%c'", phrase); } else if (phrase < 512) { saved_descp += sprintf(saved_descp, " \"%s\"/%d", keyword[phrase-256], phrase-256); } else if (phrase < 512+MAX_BIP) { saved_descp += sprintf(saved_descp, " {%s/BIP%d}", phrasename[phrase-512], BIP[phrase-512]); } else { saved_descp += sprintf(saved_descp, " <%s/%d>", phrasename[phrase-512], phrase); } } fprintf(stdout, "%s\n", saved_desc); fflush(stdout); } #endif /*>*/ match = TRUE; // stays true if all subphrases match phrase_count = 0; // only phrases which make it into the A record, // i.e. excluding literals and keywords for (each = 0; each < phrases; each++) { /*< Within a single grammar rule (alternative), ensure that each subphrase is present */ int phrase = gram[gp+each]; /*< Debug */ #ifdef DEBUG_PARSER if (debug_parser) { indent(depth, stdout); fprintf(stdout, "Input token stream = '%s' '%s' '%s' ...\n", (cp < nextfree ? c[cp].s : "EOF"), (cp+1 < nextfree ? c[cp+1].s : "EOF"), (cp+2 < nextfree ? c[cp+2].s : "EOF")); } #endif if (cp > bestparse) { static char s[128]; #ifdef DEBUG_PARSER if (phrase < 256) { sprintf(s, "'%c'", phrase); } else if (phrase < 512) { sprintf(s, "\"%s\"", keyword[phrase-256]); } else if (phrase < 512+MAX_BIP) { sprintf(s, "{%s}", phrasename[phrase-512]); } else { sprintf(s, "<%s>", phrasename[phrase-512]); } #endif looking_for = s; bestparse = cp; } #ifdef DEBUG_PARSER if (debug_parser) indent(depth, stdout); #endif /*>*/ if (phrase < 256) { /*< Literal */ #ifdef DEBUG_PARSER if (debug_parser) fprintf(stdout, "'%c'", phrase); #endif if ((c[cp].t != TYPE_CHAR) || (c[cp].s[0] != phrase)) match = FALSE; else cp++; // Don't record literals /*>*/ } else if (phrase < 512) { /*< Keyword */ #ifdef DEBUG_PARSER if (debug_parser) fprintf(stdout, "\"%s\"/%d", keyword[phrase-256], phrase-256); #endif if (strcmp(keyword[phrase-256], c[cp].s) != 0) match = FALSE; else cp++; // Don't record keywords /*>*/ } else if (phrase < 512+MAX_BIP) { /*< Built-in phrase */ int where = ap; // next phrase to be parsed will be stored at current 'ap'. #ifdef DEBUG_PARSER if (debug_parser) fprintf(stdout, "{%s/BIP%d}", phrasename[phrase-512], BIP[phrase-512]); #endif if (c[cp].t != BIP[phrase-512]) match = FALSE; else { A[ap++] = phrase; A[ap++] = 1; A[ap++] = 1; A[ap++] = cp++; A[saved_ap+3+phrase_count++] = where; // Record BIP }/*>*/ } else { /*< Recursive call to parser for a subphrase */ int where = ap; // next phrase to be parsed will be stored at current 'ap'. #ifdef DEBUG_PARSER if (debug_parser) fprintf(stdout, "<%s/%d>", phrasename[phrase-512], phrase); #endif if (!parse(phrase, depth+1)) match = FALSE; else { A[saved_ap+3+phrase_count++] = where; } /*>*/ } /*< debug */ #ifdef DEBUG_PARSER if (debug_parser) { fprintf(stdout, "\n"); indent(depth, stdout); fprintf(stdout, "Tried alternative %d: %s - result was %s\n", each+1, saved_desc, (match ? "TRUE" : "FALSE")); fflush(stdout); } #endif /*>*/ if (!match) break; /*>*/ } gp += phrases; // move over all phrases, to next alt if (match) break; #ifdef DEBUG_PARSER else if (debug_parser) { indent(depth, stdout); fprintf(stdout, "** Alternative %d FAILED.\n", i+1); } #endif // gp now points to first element (length) of next alt, or start of next phrase /*>*/ } return(match); /*>*/ } /*>*//*>*/ /*< Convert the concrete syntax tree into an abstract syntax tree. (the grammar itself is also embedded in this section) */ /*< This is primarily the main 'compile()' procedure, which is actually where the grammar of the language is defined. The grammar is extracted from in-line comments, and converted into a table by the 'takeon' program which you can find in the same directory as this file. (You can view the extracted grammar in file "varcalc.g") The style of compiler on which this design is based actually goes directly from concrete syntax tree to code generation - but that was from the days when memory was tight. Since most modern compilers - and especially books about them - are AST-based, we'll take that extra step here in order to give our students an AST-based compiler to experiment with. */ /*< Compiler tree-walking support code */ /*< In previous compilers, I had to write custom code for every tree-based optimisation, in order to walk down the tree to the right place to find the leaves to be optimised. In this one, I have a generic tree-walking procedure which can walk the entire program, but it can be customised so that it takes action only on specific phrases. This is possible in this design only because each set of subphrases stores the count of subphrases befoe it - thus allowing a generic tree-walking procedure that doesn't have to know what each node consists of until it happens across a node of the type it is looking for. However this only walks the concrete syntax tree - there's a separate Walk_AST() procedure to do the same to the AST. */ void walk_analysis_record(int ap, int depth, int wanted(int phraseno), void perform(int ap, int before, int depth)) { int i; if (wanted(A[ap])) perform(ap, TRUE, depth); for (i = 3; i < A[ap+2]+3; i++) { if (A[A[ap+i]] >= 512+MAX_BIP) walk_analysis_record(A[ap+i], depth+1, wanted, perform); } if (wanted(A[ap])) perform(ap, FALSE, depth); } int want_all(int phraseno) { return TRUE; } void print_all(int ap, int before, int depth) { #ifdef DEBUG_PARSER int saved_ap = ap; int phrase = A[ap++]; int alt = A[ap++]; int phrases = A[ap++]; // defined subphrases int i; indent(depth, stderr); fprintf(stderr, "<%s%s/%d> ", (before ? "" : "/"), phrasename[phrase-512], alt); for (i = 0; i < (3+phrases); i++) { fprintf(stderr, "A[%d] = %d, ", saved_ap+i, A[saved_ap+i]); } fprintf(stderr, "\n"); #endif } /* This is similar to the previous walking procedure for the concrete syntax tree. Under development, not yet debugged. Problem may be conceptual - is it always safe to walk all children of a node? I think we need a bitmap array of flags for each node to say whether each child is walkable or a terminal. Ideally terminals would only be on ops with no children... The tree-printing code also needs this same improvement - for the moment it only the first two children that have information in a special-case table... (see tables display_children/display_leftchild etc) */ void Walk_AST(TRIP p, int depth, int wanted(TRIP p), void perform(TRIP p, int before, int depth)) { int i; if ((p == -1) || (AST[p] == -1)) return; //fprintf(stderr, "Walk_AST(%d): %s\n", p, name[opsym(p)]); if (wanted(p)) perform(p, TRUE, depth); // the extra tests below are rather hacky and could be removed if the AST were // better defined, so that only teminal nodes contained content other than pointers to other AST nodes for (i = 1; i < tripsize(p); i++) { if (nthchild(p, i) != -1) { //fprintf(stderr, " child(%d): %s\n", i, name[opsym(nthchild(p, i))]); if (display_children[opsym(nthchild(p, i))] > 0) Walk_AST(nthchild(p, i), depth+1, wanted, perform); } } if (wanted(p)) perform(p, FALSE, depth); } int want_decls(TRIP p) { return ((opsym(p) == DECLAREARRAY) || (opsym(p) == DECLARESCALAR)); } int want_procs(TRIP p) { return (opsym(p) == DEFFN); } int test_DEFFN_or_NOOP(TRIP p) { return ((opsym(p) == DEFFN) || (opsym(p) == NOOP) || want_decls(p)); } void hide_one_DEFFN_and_DECLS(TRIP p, int before, int depth) { if (before) { if (opsym(p) == DEFFN) opsym(p) = NOOP; else if (opsym(p) == DECLARESCALAR) opsym(p) = NOOP; else if (opsym(p) == DECLAREARRAY) opsym(p) = NOOP; } } void hide_DEFFN_and_NOOP_and_DECLS(TRIP root) { Walk_AST(root, 0, test_DEFFN_or_NOOP, hide_one_DEFFN_and_DECLS); } int test_return(TRIP p) { return (opsym(p) == RETURN); } static int return_type; void record_return_types(TRIP p, int before, int depth) { if (before) { return_type = leftchild(p); } } int locate_returns(TRIP procroot) { return_type = -1; // will be overridden if a result is returned Walk_AST(procroot, 0, test_return, record_return_types); return return_type; } void codegen_stack(TRIP root); void output_stack_code(TRIP p, int before, int depth) { if (!before) return; codegen_stack(p); } void codegen_c(TRIP root); void output_c(TRIP p, int before, int depth) { if (before) { codegen_c(p); } } void output_top_level_statements(int ap, int depth, void perform(int ap, int depth)) { int i; if ((ap == -1) || (AST[ap] == -1)) return; if (opsym(ap) == SEQUENCE) { output_top_level_statements(leftchild(ap), depth+1, perform); output_top_level_statements(rightchild(ap), depth+1, perform); } else perform(ap, depth); } void print_all_AST(int ap, int before, int depth) { int saved_ap = ap; int phrase = A[ap++]; int alt = A[ap++]; int phrases = A[ap++]; // defined subphrases int i; indent(depth, stderr); fprintf(stderr, "<%s%s/%d> ", (before ? "" : "/"), phrasename[phrase-512], alt); for (i = 0; i < (3+phrases); i++) { fprintf(stderr, "A[%d] = %d, ", saved_ap+i, A[saved_ap+i]); } fprintf(stderr, "\n"); } void print_trees(int ap, int depth) { #ifdef DEBUG_PARSER #ifdef DRAW_TREES if (opsym(ap) != SEQUENCE) { draw_tree(ap); return; } if (opsym(leftchild(ap)) != SEQUENCE) { draw_tree(leftchild(ap)); } if (opsym(rightchild(ap)) != SEQUENCE) { draw_tree(rightchild(ap)); } #endif #endif } /*>*//*>*/ TRIP compile(int ap, int depth) { /*< Main code-generation procedure. This is called after parsing, with parameters which describe the parse tree. By jumping to the corresponding statement in the large 'case' below, we execute the actions associated with the parse-tree nodes. The grammar which was used to build the parse tables is extracted from the source below (from comments marked with "//\\") and the tables are built with the associated 'takeon' program. (See this same directory for the source. It's quite short...) */ int saved_ap; int phrase; // A[ap] is the phrase number. A[ap+1] is the alt. int alt; // For consistency, in BIPs, the Alt should always be 1 // although that may not be the case at the moment :-( int phrases; // defined subphrases int i; TRIP t1, t2, t3, t4, t5; // Temporaries // The following ecce command executed on this file will generate varcalc.g: // ecce -c "(v.//\\.s..(v/ /e)?m,k)0;%c" varcalc.c varcalc.g // May later tweak takeon.c to read from varcalc.c rather than varcalc.g // thus simplifying build process and becoming more like yacc. saved_ap = ap; phrase = A[ap++]; alt = A[ap++]; phrases = A[ap++]; #ifdef DEBUG // fprintf(stdout, "compile(A[%d], %d) phrase=%s\n", saved_ap, depth, phrasename[phrase-512]); #endif switch (phrase) { /*< Built-in phrases */ //\\ # BIPS (Built-in Phrases) are linked to the type-code returned //\\ # by the line-reconstruction code (aka lexer) //\\ //\\ # These *must* come first. // See TYPE_* in first page for the values to use. //\\ //\\ B<IDENT>=1; case P_IDENT: // NEED A makevar CALL! if (c[A[ap]].l > latest_line) latest_line = c[A[ap]].l; ////////////////////////////////////////////////////////fprintf(stdout, "%d: %s\n", c[A[ap]].l, c[A[ap]].s); return new_or_existingtag(c[A[ap]].s); //\\ B<NUM>=5; case P_NUM: if (c[A[ap]].l > latest_line) latest_line = c[A[ap]].l; ////////////////////////////////////////////////////////fprintf(stdout, "%d: %s\n", c[A[ap]].l, c[A[ap]].s); return make_int_const(/*INT*/TYPE_INT, c[A[ap]].s); //\\ B<CHARLIT>=3; case P_CHARLIT: if (c[A[ap]].l > latest_line) latest_line = c[A[ap]].l; ////////////////////////////////////////////////////////fprintf(stdout, "%d: %s\n", c[A[ap]].l, c[A[ap]].s); return make_binary_tuple(CONST, /*INT*/TYPE_CHARCONST, (int)*c[A[ap]].s); // NEEDS TO BE DECORATED WITH THE INFO THAT THIS WAS AN ASCII SYMBOL! /*>*/ //\\ //\\ # Phrase definitions. PROGRAM is the main entry point. //\\ case P_PROGRAM: //\\ P<PROGRAM> = <DECLARATIONS> <SSLIST>; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; return sequence(t1, t2); // break; case P_DECLARATIONS: //\\ P<DECLARATIONS> = <VARDECL> <DECLARATIONS>, //\\ <PROCDECL> <DECLARATIONS>, //\\ <DEFFN> <DECLARATIONS>, //\\ <ARRAY> <DECLARATIONS>, //\\ <EXTERN> <DECLARATIONS>, //\\ ; if (alt == 5) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; return sequence(t1, t2); // break; case P_PROCDECLARATIONS: //\\ P<PROCDECLARATIONS> = <VARDECL> <DECLARATIONS>, //\\ <ARRAY> <DECLARATIONS>, //\\ <EXTERN> <DECLARATIONS>, //\\ ; if (alt == 3) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; return sequence(t1, t2); // break; case P_PROCDECL: //\\ P<PROCDECL> = "proc" <EXISTINGVAR> '(' <ARGLIST> ')' <PROCDECLARATIONS> <SSLIST> "end" <OPTSEMI>; // some <SS>'s need to checked for and rejected semantically rather than syntactically t1 = compile(A[ap], depth+1); // procname t2 = compile(A[ap+1], depth+1); // args t3 = compile(A[ap+2], depth+1); // decls - some invalid t4 = compile(A[ap+3], depth+1); // sslist - ditto if (t3 == -1) t5 = t4; else t5 = sequence(t3, t4); return make_nary_tuple(DEFFN, t1, t2, t5); case P_SS: // simple statement that can occur almost anywhere //\\ //\\ P<SS> = <EXISTINGVAR> '=' <BOOLEXPR> <OPTIF> <OPTSEMI>, if (alt == 0) { t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = make_binary_tuple(ASSIGNSCALAR, t1, t2); t4 = compile(A[ap+2], depth+1); if (t4 != -1) { rightchild(t4) = t3; // plug the 'then' part into the IFTHEN opcode return t4; } return t3; } else if (alt == 1) { //\\ <EXISTINGVAR> '[' <EXPR> ']' '=' <BOOLEXPR> <OPTIF> <OPTSEMI>, t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); t4 = make_nary_tuple(ASSIGNARRAY, t1, t2, t3); t5 = compile(A[ap+3], depth+1); if (t5 != -1) { rightchild(t5) = t4; // plug the 'then' part into the IFTHEN opcode return t5; } return t4; } else if (alt == 2) { //\\ <PROCCALL> <OPTIF> <OPTSEMI>, t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; rightchild(t2) = t1; return t2; } else if ((alt == 3) || (alt == 4)) { //\\ <IFSEQ> <OPTSEMI>, //\\ <LOOP> <OPTSEMI>, return compile(A[ap], depth+1); } else if ((alt == 5) || (alt == 6)) { //\\ "while" '(' <BOOLEXPR> ')' "do" <SSLIST> "endwhile" <OPTSEMI>, //\\ "loop" <SSLIST> "endloop" <OPTSEMI>, // while, loop - NOT IMPL. return -1; } else if (alt == 7) { // return var - error if not in a function //\\ "return" <BOOLEXPR> <OPTIF> <OPTSEMI>, t1 = make_unary_tuple(RETURN, compile(A[ap], depth+1)); // function result t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; rightchild(t2) = t1; return t2; } else if (alt == 8) { // error if not in a procedure //\\ "return" <OPTIF> <OPTSEMI>, t1 = make_unary_tuple(RETURN, -1); // procedure return t2 = compile(A[ap], depth+1); if (t2 == -1) return t1; rightchild(t2) = t1; return t2; } else if (alt == 9) { // jump //\\ "jump" <EXISTINGVAR> <OPTIF> <OPTSEMI> t1 = make_unary_tuple(GOTO, compile(A[ap], depth+1)); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; rightchild(t2) = t1; return t2; } //\\ ; case P_SSLIST: //\\ P<SSLIST> = <OPTLABELS> <SS> <SSLIST>, //\\ ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); if (t1 == -1) t4 = t2; else t4 = sequence(t1, t2); if (t3 == -1) return t4; return sequence(t4, t3); case P_CONSTDECL: //\\ //\\ P<CONSTDECL> = "const" <EXISTINGVAR> '=' <NUMBER> <OPTSEMI>; return -1; // not implemented case P_VARDECL: //\\ //\\ P<VARDECL> = "var" <VARDEC> <VARDECLIST> <OPTSEMI>; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; return sequence(t1, t2); case P_VARDEC: // = <BOOLEXPR> ????????????????? //\\ //\\ P<VARDEC> = <IDENT> '=' <BOOLEXPR>, //\\ <IDENT>; t1 = compile(A[ap], depth+1); if (alt == 0) { t2 = compile(A[ap+1], depth+1); t3 = make_binary_tuple(VAR, t1, INT); t4 = make_binary_tuple(DECLARESCALAR, t1, t2); t5 = make_binary_tuple(ASSIGNSCALAR, t3, t2); return sequence(t4, t5); // Or should we subsume the ASSIGN into the DECLARE? } else if (alt == 1) { return make_binary_tuple(DECLARESCALAR, t1, -1 /* no init */); } case P_OPTLABELS: // 0 or more labels //\\ P<OPTLABELS> = <LABEL> <OPTLABELS>, //\\ ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; return sequence(t1, t2); case P_LABEL: //\\ P<LABEL> = <EXISTINGVAR> ':'; return make_unary_tuple(LABEL, compile(A[ap], depth+1)); //\\ P<SYNTAXERROR> = ; case P_SYNTAXERROR: // also not implemented yet fprintf(stderr, "*** Syntax error. Details later.\n"); exit(1); //\\ P<EXISTINGVAR> = <IDENT>; case P_EXISTINGVAR: t1 = compile(A[ap], depth+1); return getvar_from_tag(t1); // return make_binary_tuple(VAR, t1, INT); //zxcv WRONG!!! Needs to *find* existing tuple with this tag - need proper symbol table management TODO BUG use getvar? //\\ P<NEWVAR> = <IDENT>; case P_NEWVAR: t1 = compile(A[ap], depth+1); return make_binary_tuple(VAR, t1, INT); case P_EXTERN: //\\ P<EXTERN> = "extern" <EXISTINGVAR> <OPTSEMI>; return -1; // not implemented case P_ARGLIST: //\\ P<ARGLIST> = <EXISTINGVAR> <ARGLIST>, //\\ ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(DEFPARAM, t1, t2); case P_VARDECLIST: //\\ //\\ P<VARDECLIST> = ',' <VARDEC> <VARDECLIST>, //\\ ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; return sequence(t1, t2); case P_ARRAY: //\\ //\\ P<ARRAY> = "const" "array" <EXISTINGVAR> '[' <NUM> ']' '=' <INITLIST> <OPTSEMI>, //\\ "array" <IDENT> '[' <NUM> ']' <OPTSEMI>; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (alt == 0) { t3 = compile(A[ap+2], depth+1); } else t3 = -1; return make_nary_tuple(DECLAREARRAY, t1, t2, t3); case P_INITLIST: //\\ //\\ P<INITLIST> = <NUM> <INITLIST>, //\\ ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(PARAM, t1, t2); case P_DEFFN: //\\ //\\ P<DEFFN> = "let" <EXISTINGVAR> '(' <FORMALS> ')' '=' <BOOLEXPR> <OPTSEMI>; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); return make_nary_tuple(DEFFN, t1, t2, t3); case P_FORMALS: //\\ //\\ P<FORMALS> = <EXISTINGVAR> <RESTOFFORMALS>; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(DEFPARAM, t1, t2); case P_RESTOFFORMALS: //\\ //\\ P<RESTOFFORMALS> = ',' <EXISTINGVAR> <RESTOFFORMALS>, ; if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(DEFPARAM, t1, t2); case P_IFSEQ: //\\ P<IFSEQ> = "if" <BOOLEXPR> <OPTSEMI> <SSLIST> <OPTELSEPART> "finish"; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+2], depth+1); t3 = compile(A[ap+3], depth+1); if (t3 == -1) return make_binary_tuple(IFTHEN, t1, t2); return make_nary_tuple(IFTHENELSE, t1, t2, t3); case P_OPTELSEPART: // an elseif is returned as if it were a simple elsepart, which may contain a nested if/then/else //\\ P<OPTELSEPART> = "else" "if" <BOOLEXPR> <OPTSEMI> <SSLIST> <OPTELSEPART>, //\\ "elseif" <BOOLEXPR> <OPTSEMI> <SSLIST> <OPTELSEPART>, //\\ "else" <OPTSEMI> <SSLIST>, //\\ ; if (alt == 3) return -1; if (alt == 2) return compile(A[ap+1], depth+1); t1 = compile(A[ap], depth+1); t2 = compile(A[ap+2], depth+1); t3 = compile(A[ap+3], depth+1); if (t3 == -1) return make_binary_tuple(IFTHEN, t1, t2); return make_nary_tuple(IFTHENELSE, t1, t2, t3); case P_LOOP: // Change this: pull the condition out of the OPTIF and adjust the codegen where it is used to match. Remember this is an AST not a CST. //\\ P<LOOP> = "cycle" <OPTSEMI> <SSLIST> "repeat" <OPTIF>; t1 = compile(A[ap+1], depth+1); t2 = compile(A[ap+2], depth+1); return make_binary_tuple(REPEATIF, t1, t2); case P_PROCCALL: //\\ //\\ P<PROCCALL> = <EXISTINGVAR> '(' <PARAMLIST> ')'; t1 = compile(A[ap], depth+1); // VAR (ident is a var; leftchild of ident is a tag; rightchild of tag is index into stringpool) t2 = compile(A[ap+1], depth+1); // expr param t3 = leftchild(t1); // TAG (fn name) t4 = rightchild(t3); // stringpool offset return make_binary_tuple(FNCALL, make_proc_name(t4), t2); case P_OPTIF: //\\ //\\ P<OPTIF> = "if" <BOOLEXPR>, //\\ ; if (alt == 1) return -1; return make_binary_tuple(IFTHEN, compile(A[ap], depth+1), -1); // 'then' part is a hole to be filled. case P_BOOLEXPR: //\\ //\\ P<BOOLEXPR> = <BOOLTERM> <RESTOFBOOLTERM>; /*< (Click here to expand the code) */ // 'restofboolterm' will be a binary node of which the left branch is empty - // we need to plug that branch with 'boolterm'. t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; // no restof... leftchild(t2) = t1; return t2; /*>*/ case P_RESTOFBOOLTERM: //\\ //\\ P<RESTOFBOOLTERM> = <BOROP> <BOOLTERM> <RESTOFBOOLTERM>, //\\ ; /*< (Click here to expand the code) */ if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); if (t3 == -1) { return make_binary_tuple(leftchild(t1), -1, t2); // leave a hole for the left op from the parent } // 'restofboolterm' will be a binary node of which the left branch is empty - // we need to plug that branch with 'boolterm'. leftchild(t3) = t2; return make_binary_tuple(leftchild(t1), -1, t3); /*>*/ case P_BOOLTERM: //\\ //\\ P<BOOLTERM> = <BOOLFACTOR> <RESTOFBOOLFACTOR>; /*< (Click here to expand the code) */ t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; // no restof... leftchild(t2) = t1; return t2; /*>*/ case P_RESTOFBOOLFACTOR: //\\ //\\ P<RESTOFBOOLFACTOR> = <BANDOP> <BOOLFACTOR> <RESTOFBOOLFACTOR>, //\\ ; /*< (Click here to expand the code) */ if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); if (t3 == -1) return make_binary_tuple(leftchild(t1), -1, t2); // no restof... Parent will plug in left-hand operator. leftchild(t3) = t2; return make_binary_tuple(leftchild(t1), -1, t3); /*>*/ case P_BOOLFACTOR: //\\ //\\ P<BOOLFACTOR> = <OPTNOT> <RELATION>; /*< (Click here to expand the code) */ t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t1 == -1) return t2; return make_unary_tuple(leftchild(t1), t2); /*>*/ case P_OPTNOT: //\\ //\\ P<OPTNOT> = '!', //\\ ; if (alt == 0) return mkop(NOT); return -1; case P_OPTSEMI: //\\ //\\ P<OPTSEMI> = ';', //\\ ; return -1; case P_RELATION: //\\ //\\ P<RELATION> = <EXPR> <RESTOFRELATION>; /*< (Click here to expand the code) */ t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; // no restof... leftchild(t2) = t1; return t2; /*>*/ case P_RESTOFRELATION: //\\ //\\ P<RESTOFRELATION> = <RELOP> <EXPR>, //\\ ; /*< (Click here to expand the code) */ // implicitly "<expr> != 0", if no relop given. if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(leftchild(t1), -1, t2); /*>*/ case P_EXPR: //\\ //\\ P<EXPR> = <SUM> <RESTOFEXPR>; /*< (Click here to expand the code) */ t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; leftchild(t2) = t1; return t2; /*>*/ case P_RESTOFEXPR: //\\ //\\ P<RESTOFEXPR> = <SHIFTOP> <SUM> <RESTOFEXPR>, //\\ ; /*< (Click here to expand the code) */ if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); if (t3 == -1) return make_binary_tuple(leftchild(t1), -1, t2); // no restof... Parent will plug in left-hand operator. leftchild(t3) = t2; return make_binary_tuple(leftchild(t1), -1, t3); /*>*/ case P_SUM: //\\ //\\ P<SUM> = <OPTUNOP> <TERM> <RESTOFSUM>; /*< (Click here to expand the code) */ // CHECK THIS ONE! Is it a unary addop???? t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t1 != -1) t2 = make_unary_tuple(leftchild(t1), t2); t3 = compile(A[ap+2], depth+1); if (t3 == -1) return t2; // no restof... leftchild(t3) = t2; return t3; /*>*/ case P_UNOP: //\\ //\\ P<UNOP> = '+', '-', '\\'; return (alt == 0 ? -1 : (alt == 1 ? mkop(NEG) : mkop(NOT))); case P_OPTUNOP: //\\ //\\ P<OPTUNOP> = <UNOP>, //\\ ; /*< (Click here to expand the code) */ if (alt == 1) return -1; return compile(A[ap], depth+1); /*>*/ case P_ADDOP: // DO NOT USE AS UNARY!!! //\\ //\\ P<ADDOP> = '+', '-'; return (alt ==0 ? mkop(ADD) : mkop(SUB)); case P_OPTADDOP: //\\ //\\ P<OPTADDOP> = <ADDOP>, //\\ ; /*< (Click here to expand the code) */ if (alt == 1) return -1; return compile(A[ap], depth+1); /*>*/ case P_RESTOFSUM: //\\ //\\ P<RESTOFSUM> = <ADDOP> <TERM> <RESTOFSUM>, //\\ ; /*< (Click here to expand the code) */ if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); if (t3 == -1) return make_binary_tuple(leftchild(t1), -1, t2); // no restof... Parent will plug in left-hand operator. leftchild(t3) = t2; return make_binary_tuple(leftchild(t1), -1, t3); /*>*/ case P_TERM: //\\ //\\ P<TERM> = <BITFACTOR> <RESTOFTERM>; /*< (Click here to expand the code) */ t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; // no restof... leftchild(t2) = t1; return t2; /*>*/ case P_RESTOFTERM: //\\ //\\ P<RESTOFTERM> = <MULOP> <BITFACTOR> <RESTOFTERM>, //\\ ; /*< (Click here to expand the code) */ if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); if (t3 == -1) return make_binary_tuple(leftchild(t1), -1, t2); // no restof... Parent will plug in left-hand operator. leftchild(t3) = t2; return make_binary_tuple(leftchild(t1), -1, t3); /*>*/ //---------------- adding: case P_BITFACTOR: //\\ //\\ P<BITFACTOR> = <FACTOR> <RESTOFBITFACTOR>; /*< (Click here to expand the code) */ t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); if (t2 == -1) return t1; // no restof... leftchild(t2) = t1; return t2; /*>*/ case P_RESTOFBITFACTOR: //\\ //\\ P<RESTOFBITFACTOR> = <BITOP> <FACTOR> <RESTOFBITFACTOR>, //\\ ; /*< (Click here to expand the code) */ if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); t3 = compile(A[ap+2], depth+1); if (t3 == -1) return make_binary_tuple(leftchild(t1), -1, t2); // no restof... Parent will plug in left-hand operator. leftchild(t3) = t2; return make_binary_tuple(leftchild(t1), -1, t3); /*>*/ case P_BITOP: // STILL TO DO 'OR'... //\\ //\\ P<BITOP> = '&'; return mkop(LAND); //---------------- case P_BOROP: //\\ //\\ P<BOROP> = '||'; return mkop(BOR); case P_BANDOP: //\\ //\\ P<BANDOP> = '&&'; return mkop(BAND); case P_RELOP: //\\ //\\ P<RELOP> = '<>', '<=', '<', '>=', '>', '!=', '==', '#', '='; switch (alt) { case 0: case 7: case 5: return mkop(NE); case 1: return mkop(LE); case 2: return mkop(LT); case 3: return mkop(GE); case 4: return mkop(GT); case 6: case 8: return mkop(EQ); } case P_SHIFTOP: //\\ //\\ P<SHIFTOP> = '<<', '>>'; return (alt == 0 ? mkop(LSH) : mkop(RSH)); case P_MULOP: //\\ //\\ P<MULOP> = '*', '/'; return (alt == 0 ? mkop(MUL) : mkop(DIV)); case P_FACTOR: //\\ //\\ P<FACTOR> = '(' <BOOLEXPR> ')', //\\ <EXISTINGVAR> '(' <PARAMLIST> ')', //\\ <EXISTINGVAR> '[' <EXPR> ']', //\\ <EXISTINGVAR>, //\\ <NUMBER>; if (alt == 0) { return compile(A[ap], depth+1); } else if (alt == 1) { t1 = compile(A[ap], depth+1); // VAR (ident is a var; leftchild of ident is a tag; rightchild of tag is index into stringpool) t2 = compile(A[ap+1], depth+1); // expr param t3 = leftchild(t1); // TAG (fn name) t4 = rightchild(t3); // stringpool offset return make_binary_tuple(FNCALL, make_proc_name(t4), t2); } else if (alt == 2) { // indexed array element t1 = compile(A[ap], depth+1); // VAR (ident is a var; leftchild of ident is a tag; rightchild of tag is index into stringpool) t2 = compile(A[ap+1], depth+1); return make_binary_tuple(INDEX, t1, t2); } else if (alt == 3) { return compile(A[ap], depth+1); } else { // alt = 4 return compile(A[ap], depth+1); } case P_PARAMLIST: //\\ //\\ P<PARAMLIST> = <EXPR> <RESTOFPARAMLIST>, //\\ ; //\\ if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(PARAM, t1, t2); case P_RESTOFPARAMLIST: //\\ //\\ P<RESTOFPARAMLIST> = ',' <EXPR> <RESTOFPARAMLIST>, //\\ ; //\\ if (alt == 1) return -1; t1 = compile(A[ap], depth+1); t2 = compile(A[ap+1], depth+1); return make_binary_tuple(PARAM, t1, t2); case P_NUMBER: //\\ //\\ P<NUMBER> = <NUM>, //\\ <UNOP> <NUM>, //\\ <CHARLIT>; t1 = compile(A[ap], depth+1); if (alt == 1) { return make_unary_tuple(leftchild(t1), compile(A[ap+1], depth+1)); } return t1; //\\ //\\ E //\\ # 'E' is end of grammar. Everything after this is ignored. default: fprintf(stderr, "*** Internal error at line %d. ap=%d phrase=%d", __LINE__, ap, phrase); if (((phrase-512) >= 0) && ((phrase-512) < MAX_PHRASE)) fprintf(stderr, "\n (possible missing \"case P_%s:\" in compile()?)", phrasename[phrase-512]); fprintf(stderr, "\n"); #ifdef DEBUG t5=t5/0; #endif exit(2); } return(-1); // DUMMY TRIP, NOTHING TO RETURN /*>*/ } /*>*//*>*/ /*< Code generators - one for three-address, one for a stack machine, one that generates structured C, and one run-time interpreter. At the moment, function definitions and calls have only been added to the stack-based machine and the C translator */ static int nextlab = 0; /*< basic codegen_three_address() generates a simple 3-address intermediate code */ // Codegen is the guts of the compiler, which effectively serialises // the AST into sequentially executable statements. A further phase // is required to actually generate executable code. /*< Support I/O procs for codegen */ void declare(char *name) { fprintf(stdout, " int %s;\n", name); } void loadconst(char *format, TRIP dest, int value) { fprintf(stdout, " "); fprintf(stdout, format, dest, value); } void loadvar(char *format, TRIP dest, char *varname) { fprintf(stdout, " "); fprintf(stdout, format, dest, varname); } void store(char *format, char *varname, TRIP source) { fprintf(stdout, " "); fprintf(stdout, format, varname, source); } void operate(char *format, TRIP dest, TRIP leftop, char *op, TRIP rightop) { fprintf(stdout, " "); fprintf(stdout, format, dest, leftop, op, rightop); } void monoperate(char *format, TRIP dest, char *op, TRIP leftop) { fprintf(stdout, " "); fprintf(stdout, format, dest, op, leftop); } void put_goto(int lab) { fprintf(stdout, " "); fprintf(stdout, "goto L%02d;\n", lab); } void put_ifgoto(int cond, int lab, int sense) { fprintf(stdout, " "); fprintf(stdout, "if (%s_t%d) goto L%02d;\n", (sense ? "" : "!"), cond, lab); } void input(TRIP i) { /* TACC BUG! CANNOT ESCAPE double quotes properly */ fprintf(stdout, " "); fprintf(stdout, "fprintf(stderr, %c%s: %c); fflush(stderr); fscanf(stdin, %c%%d%c, &%s);\n", '"', nameof(leftchild(i)), '"', '"', '"', nameof(leftchild(i))); } void print(TRIP i) { fprintf(stdout, " "); fprintf(stdout, "fprintf(stdout, %c%%d\\n%c, _t%d); fflush(stdout);\n", '"', '"', leftchild(i)); } // this macro was used in debugging, not really needed now. #define put_label(n) xput_label(n, __LINE__) void xput_label(int lab, int line) { fprintf(stdout, "L%02d:\n", lab); } /*>*/ void codegen_three_address(TRIP root) { if (root == -1) return; switch (opsym(root)) { case NOOP: break; case CONST: loadconst("_t%d = %d;\n", root, rightchild(root)); break; case DECLARESCALAR: // TODO: modify for different decl types root = leftchild(root); declare(stringpool+rightchild(leftchild(root))); break; case VAR: loadvar("_t%d = %s;\n", root, nameof(root)); break; case ASSIGNSCALAR: codegen_three_address(rightchild(root)); store("%s = _t%d;\n", nameof(leftchild(root)), rightchild(root)); break; #ifdef TDODO case ASSIGNARRAY: codegen_stack(nthchild(root, 3)); fprintf(stdout, " PUSH &%s\n", nameof(leftchild(root))); // push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TODO) // might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high> codegen_stack(rightchild(root)); // array index fprintf(stdout, " ADD\n"); fprintf(stdout, " POPI\n"); break; case LABEL: fprintf(stdout, "L_%s:\n", nameof(leftchild(root))); break; case DECLARESCALAR: //fprintf(stdout, " NOOP ; DECLARESCALAR. TBD.\n"); break; case DECLAREARRAY: //fprintf(stdout, " NOOP ; DECLAREARRAY. TBD.\n"); break; case DEFFN: current_function_name = stringpool+rightchild(leftchild(leftchild(root))); fprintf(stdout, "%s:\n", current_function_name); // first we use the paramlist in rightchild to push a temporary definition // of each parameter on the namespace stack // push_temporary_parameter_definitions(rightchild(root)); // (and also generate code to pop params from stack to locals) codegen_stack(rightchild(root)); // then we compile code for the definition, which will pick up local parameters // for those idents rather than any globals of the same name // first pop the params off the data stack and assign to locals. // later implementation won't use static locals but will pick up off stack directly codegen_stack(nthchild(root, 3)); // restore_temporary_parameter_definitions(rightchild(root)); // TODO: don't output RET if last statement was 'return' fprintf(stdout, " RET\n"); // NOTE WE'RE ONLY HANDLING ONE IMPLICIT RESULT BY DEFAULT break; case DEFPARAM: // small tweak to ensure parameters popped in reverse order to undo pushes to stack codegen_stack(rightchild(root)); if (opsym(leftchild(root)) == DEFPARAM) codegen_stack(leftchild(root)); else { int formal = rightchild(leftchild(leftchild(root))); int local; static char locals[128]; sprintf(locals, "%s", stringpool+formal); if (nextstring + strlen(locals) + 1 >= MAXPOOL) exit(1); // TODO: add message strcpy(stringpool+nextstring, locals); /* Create a backstop for when not found */ local = str_to_pool(locals); if (local != nextstring) { //////////////////////////////////////fprintf(stderr, "ERROR: local parameter '%s' already exists.\n", locals); } else nextstring += strlen(locals)+1; /* Not found, add it */ rightchild(leftchild(leftchild(root))) = local; // replace for the context of this fn def // the parameter at leftchild(root) is a "VAR" - currently we can cheat // and dive into the string directly, because our model of params (and variables) // is so dumb. However when we start using types properly this will have to change. fprintf(stdout, " POP %s\n", locals); } break; #endif case PARAM: codegen_three_address(leftchild(root)); codegen_three_address(rightchild(root)); break; #ifdef TODO case RETURN: codegen_stack(leftchild(root)); // result is just a <BOOLEXPR>, or -1... fprintf(stdout, " RET\n"); break; #endif case SEQUENCE: codegen_three_address(leftchild(root)); codegen_three_address(rightchild(root)); break; #ifdef TODO case IFTHEN: { int lab = ++nextlab; // for a proper branch, need to look at root node here... codegen_stack(leftchild(root)); // push the condition on the stack (True or False) fprintf(stdout, " BF F_%d\n", lab); codegen_stack(rightchild(root)); fprintf(stdout, "F_%d:\n", lab); } break; case IFTHENELSE: { int lab = ++nextlab; // for a proper branch, need to look at root node here... codegen_stack(leftchild(root)); // push the condition on the stack (True or False) fprintf(stdout, " BF T_%d\n", lab); codegen_stack(rightchild(root)); fprintf(stdout, " B E_%d\n", lab); fprintf(stdout, "T_%d:\n", lab); codegen_stack(nthchild(root, 3)); fprintf(stdout, "E_%d:\n", lab); } break; case REPEATIF: { int lab = ++nextlab; // for a proper branch, need to look at root node here... fprintf(stdout, "B_%d:\n", lab); codegen_stack(leftchild(root)); if (rightchild(root) != -1) { codegen_stack(leftchild(rightchild(root))); fprintf(stdout, " BT B_%d\n", lab); } else { fprintf(stdout, " B B_%d\n", lab); } } break; case GOTO: fprintf(stdout, " B %s\n", stringpool+rightchild(leftchild(leftchild(root)))); break; case FNCALL: codegen_stack(rightchild(root)); // zero or more params fprintf(stdout, " CALL %s\n", stringpool+rightchild(leftchild(root))); break; case INDEX: fprintf(stdout, " PUSH &%s\n", nameof(leftchild(root))); // push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TODO) // might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high> codegen_stack(rightchild(root)); // array index fprintf(stdout, " ADD\n"); fprintf(stdout, " PUSHI\n"); break; #endif case INPUT: input(root); break; case PRINT: codegen_three_address(leftchild(root)); print(root); break; case NEG: case NOT: codegen_three_address(leftchild(root)); monoperate("_t%d = %s_t%d;\n", root, c_infix_op[opsym(root)], leftchild(root)); break; default: /* Be careful not to default anything other than binary operators! */ if (arity[opsym(root)] != 3) { fprintf(stdout, "*** Not Implemented: codegen_three_address(%s)\n", name[opsym(root)]); break; } codegen_three_address(leftchild(root)); codegen_three_address(rightchild(root)); operate("_t%d = (_t%d %s _t%d);\n", root, leftchild(root), c_infix_op[opsym(root)], rightchild(root)); break; } } /*>*/ /*< codegen_c outputs structured c */ char *t[16*1024]; // An array is overkill. Keeping the last token in 'pending' // would be enough! This was in case anything more complicated // turned out to be needed, while developing the fix. Will change later. int last_tok = -1; void emit_c(char *s) { int lastch; // The output is flat and must be run through gnu indent before use. if (s == NULL) { // flush int i; for (i = 0; i <= last_tok; i++) { fprintf(stdout, "%s", t[i]); fprintf(stdout, "%c", (strcmp(t[i], "}") != 0 ? ' ' : '\n')); free(t[i]); } fprintf(stdout, "\n"); last_tok = -1; return; } if (last_tok < 0) { t[++last_tok] = strdup(s); return; } // fix the 'semicolon problem' by suppressing extra ones after ';', '{', and '}' lastch = t[last_tok][strlen(t[last_tok])-1]; if ((strcmp(s, ";") == 0) && ((lastch == ':') || (lastch == '\n'))) { t[++last_tok] = strdup("\n"); // unsatisfactory quick hack for labels return; } if ((strcmp(s, ";") == 0) && ( (lastch == ';') || (lastch == '{') || (lastch == '}') )) return; t[++last_tok] = strdup(s); } void codegen_c(TRIP root) { // Walk the AST and output structured C // ADD: rsym psym selin selout xprompt trap // to do: brackets for reinserting precedence; 'x' symbols; save comments; invert until conds static char tok[128]; if (root == -1) return; switch (opsym(root)) { case NOOP: break; case LINENO: break; case CONST: if (leftchild(root) == TYPE_CHARCONST) { tok[0] = '\''; if (rightchild(root) == '\'' || rightchild(root) == '\\') { tok[1] = '\\'; tok[2] = rightchild(root); tok[3] = '\''; tok[4] = '\0'; } else { tok[1] = rightchild(root); tok[2] = '\''; tok[3] = '\0'; } } else { // assume TYPE_CONST sprintf(tok, "%d", rightchild(root)); } emit_c(tok); break; case VAR: emit_c(nameof(root)); break; case ASSIGNSCALAR: emit_c(nameof(leftchild(root))); emit_c("="); codegen_c(rightchild(root)); break; case ASSIGNARRAY: sprintf(tok, "%s[", nameof(leftchild(root))); emit_c(tok); codegen_c(rightchild(root)); // array index emit_c("]"); emit_c("="); codegen_c(nthchild(root, 3)); break; case LABEL: sprintf(tok, "%s:", nameof(leftchild(root))); emit_c(tok); break; case DECLARESCALAR: // TODO: modify for different decl types emit_c("static int"); emit_c(nameof(leftchild(root))); emit_c(";"); break; case DECLAREARRAY: if (nthchild(root, 3) != -1) emit_c("const"); else emit_c("static"); emit_c("int"); emit_c(nameof(leftchild(root))); emit_c("["); codegen_c(rightchild(root)); emit_c("]"); if (nthchild(root, 3) != -1) { emit_c("=");emit_c("{");codegen_c(nthchild(root, 3));emit_c("};"); // forced semicolon } else emit_c(";"); break; case DEFFN: // treewalk the function body. If 'return' has a parameter, emit int; otherwise emit void... TODO if (locate_returns(root) != -1) emit_c("int"); else emit_c("void"); emit_c(stringpool+rightchild(leftchild(leftchild(root)))); emit_c("("); if (rightchild(root) != -1) { codegen_c(rightchild(root)); // PARAM list - basically just a comma-separated list } else { emit_c("void"); } emit_c(")"); emit_c("{"); codegen_c(nthchild(root, 3)); emit_c(";"); emit_c("}"); break; case DEFPARAM: if (opsym(leftchild(root)) == DEFPARAM) { codegen_c(leftchild(root)); } else { emit_c("int"); emit_c(stringpool+rightchild(leftchild(leftchild(root)))); } if (rightchild(root) != -1) { emit_c(","); codegen_c(rightchild(root)); } break; case PARAM: codegen_c(leftchild(root)); if (rightchild(root) != -1) emit_c(","); codegen_c(rightchild(root)); break; case RETURN: emit_c("return"); if (leftchild(root) != -1) codegen_c(leftchild(root)); // result is just a <BOOLEXPR>, or -1... break; case SEQUENCE: {TRIP child; child = leftchild(root); codegen_c(child); if (child == -1) { } else if (opsym(child) == FNCALL) { emit_c(";"); } else emit_c(";"); child = rightchild(root); codegen_c(child); if (child == -1) { } else if (opsym(child) == FNCALL) { emit_c(";"); // need to check this is correct. Write some test edge cases. } } break; case IFTHEN: emit_c("if"); emit_c("("); codegen_c(leftchild(root)); emit_c(")"); if (opsym(rightchild(root)) == SEQUENCE) emit_c("{"); codegen_c(rightchild(root)); if (opsym(rightchild(root)) == SEQUENCE) {emit_c(";"); emit_c("}");} break; case IFTHENELSE: emit_c("if"); emit_c("("); codegen_c(leftchild(root)); emit_c(")"); if (opsym(rightchild(root)) == SEQUENCE) emit_c("{"); codegen_c(rightchild(root)); if (opsym(rightchild(root)) == SEQUENCE) {emit_c(";"); emit_c("}");} else emit_c(";"); emit_c("else"); if (opsym(nthchild(root, 3)) == SEQUENCE) emit_c("{"); codegen_c(nthchild(root, 3)); if (opsym(nthchild(root, 3)) == SEQUENCE) {emit_c(";"); emit_c("}");} else emit_c(";"); break; case REPEATIF: if (rightchild(root) != -1) { emit_c("do"); emit_c("{"); codegen_c(leftchild(root)); emit_c(";"); emit_c("}"); emit_c("while"); emit_c("("); codegen_c(leftchild(rightchild(root))); emit_c(")"); } else { emit_c("for"); emit_c("(;;)"); emit_c("{"); codegen_c(leftchild(root)); emit_c(";"); emit_c("}"); } break; case GOTO: emit_c("goto"); emit_c(stringpool+rightchild(leftchild(leftchild(root)))); break; case FNCALL: emit_c(stringpool+rightchild(leftchild(root))); emit_c("("); codegen_c(rightchild(root)); // zero or more params emit_c(")"); break; case INDEX: emit_c(nameof(leftchild(root))); emit_c("["); codegen_c(rightchild(root)); // array index emit_c("]"); break; #ifdef TODO case INPUT: input(root); break; case PRINT: codegen_three_address(leftchild(root)); print(root); break; #endif case NEG: case NOT: // ( don't forget to check for operator precedence and insert brackets if needed ) emit_c(c_infix_op[opsym(root)]); codegen_c(leftchild(root)); break; default: /* Be careful not to default anything other than binary operators! */ if ((root < 0) || (root >= MAXTRIPS)) { fprintf(stdout, "*** Out of range: codegen_c(%d)\n", root); break; } else { int op = opsym(root); if (arity[op] != 3) { if ((op < 0) || (op >= MAX_OPCODE)) { fprintf(stdout, "*** Not Implemented: codegen_c(%d)\n", op); break; } else { fprintf(stdout, "*** Not Implemented: codegen_c(%s)\n", name[op]); break; } } /* Given a node in your expression tree, you print it out using a straightforward algorithm which recursively prints out its two subtrees, with the operator inbetween. All you need to decide is whether to print brackets around each of the subtrees. You don't need to if the precedence of the subtree operator is higher (in the target language) than that of the middle operator, nor, of course, if the subtree is leafy. You do need to if it's lower. That only leaves the question of what to do if it's the same. In general, it's non-trivial, but in practice all your operators are going to be left-associative (except assignments and unaries which come out in the wash), so this means (doesn't it?) that the left tree won't need them and the right one might. */ { int leftbranch = leftchild(root); int left_op = opsym(leftbranch); int left_prec = prio[left_op]; int mid_prec = prio[op]; if (left_prec < mid_prec) emit_c("("); codegen_c(leftchild(root)); if (left_prec < mid_prec) emit_c(")"); } emit_c(c_infix_op[op]); { int rightbranch = rightchild(root); int right_op = opsym(rightbranch); int right_prec = prio[right_op]; int mid_prec = prio[op]; if (right_prec < mid_prec) emit_c("("); codegen_c(rightchild(root)); if (right_prec < mid_prec) emit_c(")"); } break; } } } /*>*/ /*< codegen_stack generates for a stack machine */ static char *SPACES = " "; void stack_emit(char *label, char *opcode, char *addressing_mode, char *operand, char *comment) { int col = 0, spaces; col += fprintf(stdout, "%s", label); if (strlen(label) >= 9) spaces = 1; else spaces = 10-strlen(label); col += fprintf(stdout, SPACES+strlen(SPACES)-spaces-1); col += fprintf(stdout, "%s", opcode); if (strlen(opcode) >= 6) spaces = 1; else spaces = 7-strlen(opcode); col += fprintf(stdout, SPACES+strlen(SPACES)-spaces-1); col += fprintf(stdout, "%s%s", addressing_mode, operand); if (*comment != '\0') { if (col >= 40) spaces = 1; else spaces = 25-strlen(operand); col += fprintf(stdout, SPACES+strlen(SPACES)-spaces-1); fprintf(stdout, "%s", comment); } fprintf(stdout, "\n"); } static char *stackasm_const(TRIP root) { static char tok[128]; if (leftchild(root) == TYPE_CHARCONST) { tok[0] = '\''; if (rightchild(root) == '\'' || rightchild(root) == '\\') { tok[1] = '\\'; tok[2] = rightchild(root); tok[3] = '\''; tok[4] = '\0'; } else { tok[1] = rightchild(root); tok[2] = '\''; tok[3] = '\0'; } } else { // assume TYPE_CONST sprintf(tok, "%d", rightchild(root)); } return tok; // for safety, could use stringpool? } void codegen_stack(TRIP root) { static char *current_function_name=""; // temp hack. real soln involves a scope/block-stack static int nextlab = 1000; char tok[128]; /* MUST NOT BE STATIC! */ if (root == -1) return; switch (opsym(root)) { case NOOP: //stack_emit("", "; Suppressed declaration", "", "", "; If not, check the source. Use a -1 rather than NOOP where possible"); break; case LINENO: // output source code from last line to this line here break; case CONST: stack_emit("", "PUSH", "#", stackasm_const(root), ""); break; case VAR: stack_emit("", "PUSH", "", nameof(root), ""); break; case ASSIGNSCALAR: codegen_stack(rightchild(root)); stack_emit("", "POP", "", nameof(leftchild(root)), ""); break; case ASSIGNARRAY: codegen_stack(nthchild(root, 3)); stack_emit("", "PUSH", "&", nameof(leftchild(root)), ""); // push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TODO) // might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high> codegen_stack(rightchild(root)); // array index stack_emit("", "ADD", "", "", ""); stack_emit("", "POPI", "", "", ""); break; case LABEL: stack_emit(nameof(leftchild(root)), "", "", "", ""); // Add "L_" in front of user labels? break; case DECLARESCALAR: // initialisedscalar should use .data - currently we are generating a separate explicit assignment stack_emit(nameof(leftchild(root)), ".word", "", "1", ""); break; case DECLAREARRAY: if (nthchild(root, 3) != -1) { stack_emit(nameof(leftchild(root)), ".word", "", stackasm_const(rightchild(root)), ""); //number of words to follow - can be removed //codegen_stack(nthchild(root, 3)); // a .data nnnn statement for each item } else { stack_emit(nameof(leftchild(root)), ".word", "", stackasm_const(rightchild(root)), ""); // rightchild() is wrong. Need to look inside } break; case DEFFN: current_function_name = stringpool+rightchild(leftchild(leftchild(root))); stack_emit(current_function_name, "", "", "", "; Proc/fn entry point"); // first we use the paramlist in rightchild to push a temporary definition // of each parameter on the namespace stack // push_temporary_parameter_definitions(rightchild(root)); // (and also generate code to pop params from stack to locals) codegen_stack(rightchild(root)); // then we compile code for the definition, which will pick up local parameters // for those idents rather than any globals of the same name // first pop the params off the data stack and assign to locals. // later implementation won't use static locals but will pick up off stack directly codegen_stack(nthchild(root, 3)); // restore_temporary_parameter_definitions(rightchild(root)); // TODO: don't output RET if last statement was 'return' stack_emit("", "RET", "", "", ""); break; case DEFPARAM: // small tweak to ensure parameters popped in reverse order to undo pushes to stack codegen_stack(rightchild(root)); if (opsym(leftchild(root)) == DEFPARAM) codegen_stack(leftchild(root)); else { int formal = rightchild(leftchild(leftchild(root))); int local; static char locals[128]; sprintf(locals, "%s", stringpool+formal); if (nextstring + strlen(locals) + 1 >= MAXPOOL) exit(1); // TODO: add message strcpy(stringpool+nextstring, locals); /* Create a backstop for when not found */ local = str_to_pool(locals); if (local != nextstring) { //////////////////////////////////////fprintf(stderr, "ERROR: local parameter '%s' already exists.\n", locals); } else nextstring += strlen(locals)+1; /* Not found, add it */ rightchild(leftchild(leftchild(root))) = local; // replace for the context of this fn def // the parameter at leftchild(root) is a "VAR" - currently we can cheat // and dive into the string directly, because our model of params (and variables) // is so dumb. However when we start using types properly this will have to change. stack_emit("", "POP", "", locals, ""); } break; case PARAM: codegen_stack(leftchild(root)); codegen_stack(rightchild(root)); break; case RETURN: codegen_stack(leftchild(root)); // result is just a <BOOLEXPR>, or -1... stack_emit("", "RET", "", "", ""); break; case SEQUENCE: codegen_stack(leftchild(root)); codegen_stack(rightchild(root)); break; case IFTHEN: { int lab = ++nextlab; // for a proper branch, need to look at root node here... codegen_stack(leftchild(root)); // push the condition on the stack (True or False) sprintf(tok, "F_%d", lab); stack_emit("", "BF", "", tok, ""); codegen_stack(rightchild(root)); stack_emit(tok, "", "", "", ""); } break; case IFTHENELSE: { int lab = ++nextlab; // for a proper branch, need to look at root node here... codegen_stack(leftchild(root)); // push the condition on the stack (True or False) sprintf(tok, "T_%d", lab); stack_emit("", "BF", "", tok, ""); codegen_stack(rightchild(root)); sprintf(tok, "E_%d", lab); stack_emit("", "B", "", tok, ""); sprintf(tok, "T_%d", lab); stack_emit(tok, "", "", "", ""); codegen_stack(nthchild(root, 3)); sprintf(tok, "E_%d", lab); stack_emit(tok, "", "", "", ""); } break; case REPEATIF: { int lab = ++nextlab; // for a proper branch, need to look at root node here... sprintf(tok, "B_%d", lab); stack_emit(tok, "", "", "", ""); codegen_stack(leftchild(root)); if (rightchild(root) != -1) { codegen_stack(leftchild(rightchild(root))); stack_emit("", "BT", "", tok, ""); } else { stack_emit("", "B", "", tok, ""); } } break; case GOTO: stack_emit("", "B", "", stringpool+rightchild(leftchild(leftchild(root))), ""); break; case FNCALL: codegen_stack(rightchild(root)); // zero or more params stack_emit("", "CALL", "", stringpool+rightchild(leftchild(root)), ""); break; case INDEX: stack_emit("", "PUSH", "&", nameof(leftchild(root)), ""); // push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TODO) // might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high> codegen_stack(rightchild(root)); // array index stack_emit("", "ADD", "", "", ""); stack_emit("", "PUSHI", "", "", ""); break; #ifdef TODO case INPUT: input(root); break; case PRINT: codegen_three_address(leftchild(root)); print(root); break; #endif case NEG: case NOT: codegen_stack(leftchild(root)); stack_emit("", name[opsym(root)], "", "", ""); break; default: /* Be careful not to default anything other than binary operators! */ if ((root < 0) || (root >= MAXTRIPS)) { fprintf(stdout, "*** Out of range: codegen_stack(%d)\n", root); break; } else { int op = opsym(root); if (arity[op] != 3) { if ((op < 0) || (op >= MAX_OPCODE)) { fprintf(stdout, "*** Not Implemented: codegen_stack(%d)\n", op); break; } else { fprintf(stdout, "*** Not Implemented: codegen_stack(%s)\n", name[op]); break; } } codegen_stack(leftchild(root)); codegen_stack(rightchild(root)); stack_emit("", name[op], "", "", ""); break; } } } /*>*/ /*< Run-time interpreter: Cheap & nasty code execution directly from the AST! */ // Trivial run-time stack implementation: static int stack[128]; static int stackp = -1; void Push(int val) { fprintf(stderr, "Push(%d)\n", val); stack[++stackp] = val; } int Pop(void) { fprintf(stderr, "Pop() -> %d\n", stack[stackp]); return stack[stackp--]; } /*< (a little hack) */ // What I should be doing here is making a VAR tuple one unit larger, with the // extra unit pointing to the memory address/offset where the var is stored. // This was in fact always part of the plan (along with more complex type information) // but at this stage the code is really just a placeholder. // Meanwhile, by having an array the same size as the AST, we have an obvious // place where we can store the data, and for now do not need to implement // the DECLARE opcode in the interpreter below... int wasteful_array[MAXTRIPS]; // this hack doesn't work for arrays, only scalars #define variable_contents(memloc) wasteful_array[memloc] // NEEDS A COMPLETE REWRITE TO ALLOCATE MEMORY PROPERLY AND STORE IN 'VAR' STRUCTURE - need 'int *mem' mallocked appropriately /*>*/ // We cannot execute the AST in the obvious manner because this language (and our test program) // use jumps all over the place. We have to flatten and serialise all the control flow structures // such as repeat/until loops etc. We can probably keep procedures as a recursive call however // since jumping into and out of procedures is not allowed. // No problem leaving expressions and assignments etc as high-level objects. We don't need to // compile down to anything as basic as a byte code. static int CODE[16*1024]; // These two arrays are very temp hacks static int Memory[640*1024]; // old PC size :-) static int CODEPC = 0; typedef int CODEP; CODEP linear_code(TRIP orig) { CODEP here = CODEPC; CODE[CODEPC] = make_unary_tuple(LINEAR_CODE, orig); CODEPC += 1; return here; } CODEP plant_code(TRIP orig) { CODEP here = CODEPC; CODE[CODEPC] = orig; CODEPC += 1; return here; } // prototype symbol table management - VERY TEMPORARY CODEP lookup_linearised_proc(char *name) { fprintf(stderr, "NOT IMPLEMENTED: lookup_linearised_proc(%s)\n", name); return 0; } CODEP lookup_jump_label(char *name) { fprintf(stderr, "NOT IMPLEMENTED: lookup_jump_label(%s)\n", name); return 0; } void define_linearised_proc(char *name, CODEP addr) { fprintf(stderr, "NOT IMPLEMENTED: define_linearised_proc(%s, %d)\n", name, addr); } void define_jump_label(char *name, CODEP addr) { fprintf(stderr, "NOT IMPLEMENTED: define_jump_label(%s, %d)\n", name, addr); } void serialise_AST(TRIP root) { // make a few tweaks to the data structure to make execution easier static char *current_function_name; int tmp1, tmp2; if (root == -1) return; //zxcv switch (opsym(root)) { case LABEL: // enter (nameof(leftchild(root)), CODEPC) into table for later retrieval by lookup_jump_label define_jump_label(nameof(leftchild(root)), CODEPC); break; case DEFFN: // need to enter address of function in a table, and linearise the code in the function too. // enter the procedure name in a table of functions mapping to linearised code addresses current_function_name = stringpool+rightchild(leftchild(leftchild(root))); define_linearised_proc(current_function_name, CODEPC); // first we use the paramlist in rightchild to push a temporary definition // of each parameter on the namespace stack // push_temporary_parameter_definitions(rightchild(PC)); // (and also generate code to pop params from stack to locals) serialise_AST(rightchild(root)); // then we compile code for the definition, which will pick up local parameters // for those idents rather than any globals of the same name // first pop the params off the data stack and assign to locals. // later implementation won't use static locals but will pick up off stack directly serialise_AST(nthchild(root, 3)); // restore_temporary_parameter_definitions(rightchild(PC)); plant_code(make_unary_tuple(RETURN, -1)); // fallback return for procs - error if function break; case FNCALL: // look up address of function from table, substitute address into leftchild(root) element // We'll probably insist that a function is fully defined before it is used, otherwise we need to patch like with jump labels linear_code(rightchild(root)); // zero or more params // plant updated copy of call: plant_code(make_unary_tuple(FNCALL, lookup_linearised_proc(stringpool+rightchild(leftchild(root))))); break; case RETURN: // plant code to push return parameter if present, then plant the return opcode (ie copy of this trip) linear_code(leftchild(root)); // result is just a <BOOLEXPR>, or -1... plant_code(make_unary_tuple(RETURN, -1)); // simple return, param already handled break; case SEQUENCE: serialise_AST(leftchild(root)); serialise_AST(rightchild(root)); break; case IFTHEN: linear_code(leftchild(root)); // push the condition on the stack (True or False) plant_code(tmp1 = make_unary_tuple(BF, -1)); serialise_AST(rightchild(root)); leftchild(tmp1) = CODEPC; break; case IFTHENELSE: // same sort of thing as above linear_code(leftchild(root)); // push the condition on the stack (True or False) plant_code(tmp1 = make_unary_tuple(BF, -1)); // jump over 'then' part serialise_AST(rightchild(root)); plant_code(tmp2 = make_unary_tuple(B, -1)); // jump over 'else' part leftchild(tmp1) = CODEPC; serialise_AST(nthchild(root, 3)); leftchild(tmp2) = CODEPC; break; case REPEATIF: tmp1 = CODEPC; // note this address for jump back serialise_AST(leftchild(root)); if (rightchild(root) != -1) { linear_code(leftchild(rightchild(root))); // test condition plant_code(make_unary_tuple(BT, tmp1)); // conditional branch back to start of loop } else { plant_code(make_unary_tuple(B, tmp1)); // unconditional branch back to start of loop } break; case GOTO: // look up code address of label. may not be planted yet so will need to plug later. Chain back? tmp1 = lookup_jump_label(stringpool+rightchild(leftchild(leftchild(root)))); plant_code(make_unary_tuple(B, tmp1)); // jump over 'else' part break; default: linear_code(root); // shouldn't be any flow control issues in remaining opcodes break; } } void execute_AST(TRIP PC) { int op, opd; TRIP left, right; if (PC == -1) return; // remaining opcodes must not do any flow-control, just simple imperative statements //zxcv fprintf(stderr, "execute_AST(%d)\n", PC); fprintf(stderr, " %s (%d)\n", name[opsym(PC)], opsym(PC)); switch (opsym(PC)) { case NOOP: break; case LINENO: break; case CONST: Push(rightchild(PC)); break; case VAR: Push(variable_contents(PC)); break; case ASSIGNSCALAR: execute_AST(rightchild(PC)); variable_contents(leftchild(PC)) = Pop(); fprintf(stderr, " %s = %d\n", // Until I add a "print" command to this language, we can see the results of nameof(leftchild(PC)), // computations by a simple hack, which is to print out the value of any assignments. variable_contents(leftchild(PC))); break; case ASSIGNARRAY: #ifdef TODO codegen_stack(nthchild(PC, 3)); stack_emit("", "PUSH", "&", nameof(leftchild(PC)), ""); // push the ADDRESS of the LHS - easy if it is a name, but be careful for a[2][3] etc (TODO) // might not be so hard if we implement arrays as a dopevector consisting of <addr,low,high> codegen_stack(rightchild(PC)); // array index stack_emit("", "ADD", "", "", ""); stack_emit("", "POPI", "", "", ""); #endif break; case LABEL: #ifdef TODO stack_emit(nameof(leftchild(PC)), "", "", "", ""); // Add "L_" in front of user labels? #endif break; case DECLARESCALAR: #ifdef TODO // initialisedscalar should use .data - currently we are generating a separate explicit assignment stack_emit(nameof(leftchild(PC)), ".word", "", "1", ""); #endif break; case DECLAREARRAY: #ifdef TODO if (nthchild(PC, 3) != -1) { stack_emit(nameof(leftchild(PC)), ".word", "", stackasm_const(rightchild(PC)), ""); //number of words to follow - can be removed //codegen_stack(nthchild(PC, 3)); // a .data nnnn statement for each item } else { stack_emit(nameof(leftchild(PC)), ".word", "", stackasm_const(rightchild(PC)), ""); // rightchild() is wrong. Need to look inside } #endif break; case DEFPARAM: #ifdef TODO // small tweak to ensure parameters popped in reverse order to undo pushes to stack codegen_stack(rightchild(PC)); if (opsym(leftchild(PC)) == DEFPARAM) codegen_stack(leftchild(PC)); else { int formal = rightchild(leftchild(leftchild(PC))); int local; static char locals[128]; sprintf(locals, "%s", stringpool+formal); if (nextstring + strlen(locals) + 1 >= MAXPOOL) exit(1); // TODO: add message strcpy(stringpool+nextstring, locals); /* Create a backstop for when not found */ local = str_to_pool(locals); if (local != nextstring) { //////////////////////////////////////fprintf(stderr, "ERROR: local parameter '%s' already exists.\n", locals); } else nextstring += strlen(locals)+1; /* Not found, add it */ rightchild(leftchild(leftchild(PC))) = local; // replace for the context of this fn def // the parameter at leftchild(PC) is a "VAR" - currently we can cheat // and dive into the string directly, because our model of params (and variables) // is so dumb. However when we start using types properly this will have to change. stack_emit("", "POP", "", locals, ""); } #endif break; case PARAM: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); break; case INDEX: // push_address(nameof(leftchild(PC))); - or ? - push_address(leftchild(PC)); execute_AST(rightchild(PC)); // calculate array index right = Pop(); left = Pop(); Push(Memory[left+right]); // contents of memory at address+offset (integer offset, not byte offset) break; case INPUT: #ifdef TODO input(PC); #endif break; case PRINT: #ifdef TODO execute_AST(leftchild(PC)); print(PC); #endif break; case NEG: execute_AST(leftchild(PC)); Push(-Pop()); break; case NOT: execute_AST(leftchild(PC)); Push(!Pop()); // Boolean NOT, not bitwise NOT break; // BAND, BOR, case BAND: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left && right); break; case BOR: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left || right); break; // ADD, SUB, MUL, DIV, case ADD: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left + right); break; case SUB: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left - right); break; case MUL: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left * right); break; case DIV: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); if (right == 0) { fprintf(stderr, "Run-time error: divide by zero\n"); exit(1); } Push((int)(left / right)); break; // LSH, RSH, EXP, case LSH: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left << right); break; case RSH: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left >> right); break; case EXP: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); //Push(iexp(left, right)); {int temp = left; while (right >= 2) { left = left * temp; right -= 1; } } Push(left); break; // EQ, NE, LT, GT, LE, GE, case EQ: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left == right); break; case NE: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left != right); break; case LT: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left < right); break; case GT: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left > right); break; case LE: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left <= right); break; case GE: execute_AST(leftchild(PC)); execute_AST(rightchild(PC)); right = Pop(); left = Pop(); Push(left >= right); break; // These opcodes should never be executed as they've already been converted to B/BT/BF case IFTHEN: case IFTHENELSE: case REPEATIF: case GOTO: case FNCALL: // handled in level above case RETURN: case DEFFN: // linearised already case SEQUENCE: // should not happen after serialisation! default: // INTERNAL ERROR! fprintf(stdout, "*** Not Implemented: execute_AST(%s)\n", name[opsym(PC)]); exit(1); break; } } static TRIP pcstack[1024]; static int pcstackp = 0; void push_returnaddr(TRIP PC) { pcstack[pcstackp++] = PC; } TRIP pop_returnaddr(void) { return pcstack[--pcstackp]; } void execute_serialised_AST(CODEP PC) { int op, opd; TRIP trip, left, right; // *only* special AST objects for handling flow control. Simple imperatives are called the old way // THIS COULD BE DONE WITH A CODE[] ARRAY. DOES NOT NEED TO OVERLOAD AST[] //zxcv for (;;) { trip = CODE[PC]; op = opsym(trip); fprintf(stderr, "PC: %04x (op=%s (%d))\n", PC, name[op], op); switch (op) { case B: if (leftchild(trip) == -2) exit(0); // hack to terminate program PC = leftchild(trip); break; case BT: opd = Pop(); if (opd) PC = leftchild(trip); else PC += 1; break; case BF: opd = Pop(); if (!opd) PC = leftchild(trip); else PC += 1; break; case FNCALL: push_returnaddr(PC+1); PC = leftchild(trip); break; case RETURN: PC = pop_returnaddr(); break; case LINEAR_CODE: execute_AST(leftchild(trip)); PC += 1; break; default: fprintf(stderr, "execute_serialised_AST: bad opcode %d\n", op); exit(1); } fprintf(stderr, "PC after: %04x\n", PC); } } /*>*/ /*>*/ /*>*/ int main(int argc, char **argv) { int opt_3address = FALSE, opt_debug = FALSE, opt_stack = FALSE, opt_c = FALSE, opt_execute = TRUE; char *s; #ifdef DEBUG // GDB backtrace facility! // extern void restart_under_gdb(int argc, char **argv); // restart_under_gdb(argc, argv); #endif /*< Handle program arguments */ /*< Get clean version of executable name. Should work on most existing systems (2006) */ progname = strdup(argv[0]); if ((s = strrchr(progname, '/')) != NULL) progname = s+1; // Unix if ((s = strrchr(progname, '\\')) != NULL) progname = s+1; // M$ if ((s = strrchr(progname, ']')) != NULL) progname = s+1; // Dec if ((s = strrchr(progname, ';')) != NULL) *s = '\0'; // Version no's if (((s = strrchr(progname, '.')) != NULL) && (strcasecmp(s, ".exe") == 0)) *s = '\0'; if (((s = strrchr(progname, '.')) != NULL) && (strcasecmp(s, ".com") == 0)) *s = '\0'; /*>*/ if ((argc == 3) && strcmp(argv[1], "-d") == 0) { argv++; argc--; debug_parser = TRUE; } if (argc != 2) { fprintf(stderr, "syntax: %s [-3cdehs] filename\n", progname); exit(1); } sourcefile = fopen(argv[1], "r"); if (sourcefile == NULL) { fprintf(stderr, "%s: %s - %s\n", progname, strerror(errno), argv[1]); exit(errno); } curfile = argv[1]; startline = TRUE; whitespace = TRUE; onecharstr = (char *)malloc(512); /*>*/ /*< Lexical scan */ line_reconstruction(); // Effectively, lexing. /*< Debug the lexed tokens? */ #ifdef DEBUG_PARSER if (debug_parser) { int i; // DEBUG ONLY fprintf(stderr, "\nLexical token stream:\n\n"); for (i = 0; i < nextfree; i++) { fprintf(stderr, "C[%d] => %s, line %d, col %d: [%0d] %s\n", i, c[i].f, c[i].l, c[i].col, c[i].t, c[i].s); } } #endif /*>*//*>*/ /*< Call the parser */ #ifdef DEBUG_TRIPS_DURING fprintf(stderr, "Trips before patching holes:\n"); #endif if (!parse(PHRASE_BASE, 0)) { /*< Attempt to print a sensible error if the parse failed */ if (bestparse == nextfree) { fprintf(stderr, "\"%s\", Line %d, Col %d: Premature end of file while looking for %s\n", argv[1], c[bestparse].l, c[bestparse].col+1, looking_for); } else { int i; fprintf(stderr, "\"%s\", Line %d, Col %d: Syntax error while looking for %s near ", argv[1], c[bestparse].l, c[bestparse].col+1, looking_for); for (i = bestparse; i < bestparse+3; i++) { if (i == nextfree) { fprintf(stderr, "<End of file>"); break; } switch (c[i].t) { case TYPE_HEXINT: fprintf(stderr, "$"); // *OR* ... We could put the '$' back in front of the string /* drop through */ // and probably save much code whenever printing. Use str+1 case TYPE_TAG: case TYPE_CHAR: case TYPE_INT: case TYPE_KEYWORD: fprintf(stderr, "%s", c[i].s); break; case TYPE_STRING: fprintf(stderr, "\"%s\"", c[i].s); break; case TYPE_CHARCONST: fprintf(stderr, "'%s'", c[i].s); break; } fprintf(stderr, (i == (bestparse+2) ? " ..." : " ")); } fprintf(stderr, "\n"); } /*>*/ exit(1); }/*>*/ /*< Generate code */ /*< Debugging */ #ifdef DEBUG_PARSER if (debug_parser) walk_analysis_record(0, 0, want_all, print_all); // Diags: print final parse tree #endif /*>*/ {int program; /*< Debugging */ #ifdef DEBUG_TRIPS_DURING fprintf(stdout, "Trips as they are created (before patching holes):\n"); fflush(stdout); #endif /*>*/ program = compile(0, 0); if (nexttrip <= 0) { fprintf(stdout, "\nError: no code generated! (nexttrip = %d)\n", nexttrip); exit(0); } /*< Debugging */ #ifdef DEBUG_TRIPS_AFTER fprintf(stdout, "\n\nAST (with trips after patching holes; nexttrip = %d):\n\n", nexttrip); fflush(stdout); {int i, lasttrip; for (i = 0; i < nexttrip; i += arity[AST[i]]) printtrip(lasttrip = i); fprintf(stdout, "\n\n"); fflush(stdout); #ifdef DEBUG_TREES output_top_level_statements(lasttrip, 0, print_trees); #endif } #endif /*>*/ #ifdef DEBUG if (opt_debug) { int i, l; l = c[0].l; fprintf(stdout, "\ntokens:\n\n %4d: ", l); fflush(stdout); for (i = 0; i < nextfree; i++) { if (c[i].l != l) {fprintf(stdout, "\n %4d: ", c[i].l); l = c[i].l;} if (c[i].t == TYPE_CHARCONST) { fprintf(stdout, "'%s' ", c[i].s); } else { fprintf(stdout, "%s ", c[i].s); } } fprintf(stdout, "\n\n"); } #endif // Now generate the output code from the AST. if (opt_3address) { fprintf(stdout, "\nAST serialised into three-address code:\n\n"); codegen_three_address(program); } else if (opt_stack) { stack_emit("; Stack-based code", "","","",""); Walk_AST(program, 0, want_decls, output_stack_code); Walk_AST(program, 0, want_procs, output_stack_code); stack_emit("__start", "","","","; Main Entry Point"); hide_DEFFN_and_NOOP_and_DECLS(program); // Hide procedures and local decls (irreversible, so this codegen better come last) codegen_stack(program); stack_emit("", "EXIT","","",""); } else if (opt_c) { emit_c("#include <stdio.h>\n"); emit_c("#include <stdlib.h>\n"); emit_c("\n"); emit_c("static int instream = 0, outstream = 0;\n"); emit_c("FILE *infile = NULL, *secondary = NULL, *outfile = NULL;\n"); emit_c("\n"); emit_c("int trap(int mask)\n"); emit_c("{\n"); emit_c(" return (0);\n"); emit_c("}\n"); emit_c("\n"); emit_c("void dosignal(int i, int j, int k)\n"); emit_c("{\n"); emit_c(" if (i == 0) exit(0);\n"); emit_c(" if ((i == 14) && (j == 2)) {\n"); emit_c(" fprintf(stderr, \"Missing switch label: '%c'\\n\", k); exit(1);\n"); emit_c(" }\n"); emit_c(" fprintf(stderr, \"Unhandled signal %d %d %d\\n\", i, j, k);\n"); emit_c(" exit(1);\n"); emit_c("}\n"); emit_c("\n"); emit_c("int consoleget(void)\n"); emit_c("{\n"); emit_c(" int c;\n"); emit_c(" c = fgetc( stdin );\n"); emit_c(" return c;\n"); emit_c("}\n"); emit_c("\n"); emit_c("int fileget(int stream)\n"); emit_c("{\n"); emit_c(" int c;\n"); emit_c(" c = fgetc( (stream == 1 ? infile : secondary) );\n"); emit_c(" return c;\n"); emit_c("}\n"); emit_c("\n"); emit_c("void putsym(int c)\n"); emit_c("{\n"); emit_c(" fputc(c, outfile);\n"); emit_c("}\n"); emit_c("\n"); emit_c("void psym(int c)\n"); emit_c("{\n"); emit_c(" fputc(c, stdout);\n"); emit_c("}\n"); emit_c("\n"); emit_c("void xprompt(int c)\n"); emit_c("{\n"); emit_c(" fputc(c, stdout);\n"); emit_c("}\n"); emit_c("\n"); Walk_AST(program, 0, want_decls, output_c); // output top-level declarations only, not initialisations if dynamic (TODO) emit_c("\n"); Walk_AST(program, 0, want_procs, output_c); hide_DEFFN_and_NOOP_and_DECLS(program); // Hide procedures and local decls (irreversible, so this codegen better come last) emit_c("int main(int argc, char **argv)"); emit_c("{"); emit_c(" if (argc != 3) {\n"); emit_c(" fprintf(stderr, \"syntax: %s infile outfile\\n\", argv[0]);\n"); emit_c(" exit(1);\n"); emit_c(" }\n"); emit_c(" if (strcmp(argv[1], argv[2]) == 0) {\n"); emit_c(" fprintf(stderr, \"%s: output file cannot overwrite input file\\n\", argv[0]);\n"); emit_c(" exit(1);\n"); emit_c(" }\n"); emit_c(" infile = fopen(argv[1], \"r\");\n"); emit_c(" if (infile == NULL) {\n"); emit_c(" fprintf(stderr, \"%s: cannot read file '%s'\\n\", argv[0], argv[1]);\n"); emit_c(" exit(1);\n"); emit_c(" }\n"); emit_c(" outfile = fopen(argv[2], \"w\");\n"); emit_c(" if (outfile == NULL) {\n"); emit_c(" fprintf(stderr, \"%s: cannot write file '%s'\\n\", argv[0], argv[2]);\n"); emit_c(" exit(1);\n"); emit_c(" }\n"); codegen_c(program); // output initialisation of declared variables but not the actual declarations. TODO!!! emit_c("exit(0);"); emit_c("}"); emit_c(NULL); // and flush } else if (opt_execute) { fprintf(stderr, "\nRun-time interpretation.\n\nSerialise AST:\n"); CODEPC = 0; serialise_AST(program); // reassign result to program ? linear_code(make_unary_tuple(B, -2)); // exit at end of program // can we fix up back pointers merely by resetting 'nexttrip' and calling the same code again // to generate linear AST in same sequence, now we know where everything is going to be stored? // (jump label destinations, procedure entry points etc) // start address is wrong. need to note where the '__start' label would have been planted. // note it during serialisation and use it when starting execution // declarations, especially const arrays, still to be done. // there's a problem with 'var' tuples. should only be one, at point of declaration? // not sure yet if it is a conceptual problem or a bug //zxcv { int i; for (i = 0; i < CODEPC;i++) { fprintf(stderr, "CODE %04x: ", i); if (opsym(CODE[i]) == LINEAR_CODE) { int trip = leftchild(CODE[i]); fprintf(stderr, " ... tree starting with ... "); printtrip(trip); } else { // print special trip printtrip(CODE[i]); } } } fprintf(stderr, "\n\nExecute Serialised AST:\n"); execute_serialised_AST(0); } } /*>*/ exit(0); return(1); }