Meta Expressions

The immediate evaluation of constant expressions in the preprocessor and/or parser has the disadvantage that all constant names are lost, and only the resulting (integer) value is preserved. For an cross compiler it should be possible to store all expressions in a language independent meta format. This is an attempt to describe such a meta language for expressions, with the option to extend the language with meta statements.

General Idea

A meta expression represents an parse tree, with explicit operator precedence and grouping.
All expressions are stored in functional form, <op>(<args>). <op> can either be an operator name or shortcut, as convenient.
The result is a list structure, with a general base syntax:
Expr ::= Value | [ Operator ] "(" [ ExprList ] ")" | Stmt .
ExprList ::= Expr { ";" Expr }
Value ::= number | identifier | '"' string '"' | "'" character "'" .
Stmt ::= ... //to be defined

Declaration ::= TypeDecl | StructDecl | ProcDecl | VarDecl | MacroDecl | LabDecl .
    TypeDecl ::= "T" ID "=" Type .
        ID ::= "#" number | [ "{" ID "}" ] identifier [ "." ID ].
    StructDecl ::= StructID ID "{" { MemberDecl ";" } "}" .
        StructID ::= "E" | "S" | "U" | "I" .
        MemberDecl ::= ("m"|"c"|"d"|"p") ID [ ":" Type ] [ "{" Expr "}" ]
    ProcDecl ::= ("F"|"P"|"M") identifier ProcType [ "{" [StmtList] "}" ] .
        ProcType ::= [Scope] ["P"] "(" Params ")" [Call] [Type] .
    MacroDecl ::= ("#"|"X"|"M") identifier [ "(" Params ")" ] "{" [StmtList] "}" .
    VarDecl ::= ("V"|"C") [Scope] identifier [ ":" Type ] [ "{" [ExprList] "}" ] .
    LabDecl ::= "L" identifier .

DeclID = "T" | "E" | "S" | "U" | "F" | "P" | "V" | "C" | "L" .
Item ::= [ Literal ] [ "(" { Item "," } ")" ] [ ":" Item ] [ "=" Expr ] .
Literal ::= number | identifier | '"' string '"' | "'" character "'" .

VarConst ::= identifier [ ":" Type ] [ "{" ExprList "}" ] .
Type ::= identifier "=" { TypeMod } ( '"' identifier '"' | BaseType ) .
TypeMod ::= "*" | "[" { ExprList } "]" | "(" ParamList ")" |

Identifiers can be alphanumeric or operator-shortcuts. They cannot begin with any of the other Literal item delimiters.
Parentheses always must be paired, unless they are embedded into a Literal. For optimized scanning the following set of paired delimiters should be used:

() - general purpose, nestable, can also contain:
{} - comments
"" - strings
'' - characters

Inside Comment, String and Character (Literal) items none but the respective closing character have a special meaning. The closing character itself can be escaped by duplication. Sequences of literals with identical leading and trailing characters must be separated by some other character, a space by default. String and Character literals can be interpreted as names of types and variables, according to the Declaration syntax.

Commas "," inside item lists are definite item separators (terminators). Other list separators may be used to structure individual items. The suggested convention of item terminators instead of item separators simplifies the input and output procedures for list/tree structures.

Newline and whitespace characters are unspecific separators, with no special meaning, but they can occur only in places where other separators are allowed. Inside Literals all whitespace characters are part of the literal, but newline characters are not allowed.

The semantical interpretation of all Items is implied by the Item name. The part after the closing Item parentheses can have a special structure, as outlined above, but the exact interpretation depends on the actual context.

Operators

At least the following operators <op> must exist for C expressions:

unary operators: op(arg)

not, neg, addr, deref, subexpression, selection, typecast, conversion

The C operators (!, ~, -, *, &) can be used, where ambiguous unary/binary operators can be distinguished by the argument count.
Pre/postfix increment and decrement deserve different operators.
Subexpressions may have an empty operator, i.e. only the expression is stored inside parentheses.

Member selection (".", "->") can be expressed in the same syntax, with appropriate grouping.
Array selection can be expressed by an array operator [](array,index).
Explicit type casts and conversions should be specified by distinct cast and convert operators. Type conversions can be implied by specific operators, and possibly can be represented by an optional ":"<type> postfix attribute. By convention a double-quoted typename can be used as an identifier for type conversions; this comes close to procedural type casts (C++, Delphi), but may lead to ambiguities. Explicit type casts must be retained in explicit Items, possibly with explicit source as well as target types.

binary operators: op(lhs, rhs)

+ - * / & && | || (etc., as specified for C)
==, <, > (etc.)

Assignment operators can be splitted into two operations: assign(lhs,op(lhs,rhs))

ternary operator

?(cond, true, false)

The same syntax can be used for conditional statements (if, switch), as long as statments can be distinguished from expressions.

sequential operator

seq(expr1,...,exprN)

This operator can be used to represent the C comma operator.

function operator

f(name, args...)
By convention the function name, enclosed in single quotes, can be used as the identifier for an function call.

Operands

Operands can be:

operations (recursive)
symbols
values (integral, floating point numbers, char and string literals)

Symbols can be attributed with values, values can be attributed with an explicit type, according to the Declaration syntax:
[name][:type][=value]

For numeric values more attributes can be inserted, to e.g. distinguish integral and floating point values, the original base of integeral values (decimal, octal...) etc.

Statements

Statements and operations sometimes can be represented in the same way (assignment...), as long as it's clear whether a statement or expression is expected.

Operational Statements

assignment, procedure-call, expression

These statements can use the according expression syntax of assignment, function-call and general expressions.

Conditional Statements

As mentioned above, the Ternary operator and If statements can be expressed in the same syntax, with an optional or possibly empty False (last) argument for If statements.
?(cond,true[,false])

A Switch can have discontiguous case values, i.e. none or one statement list can pertain to one or more values, so that a special syntax is required. A Switch also can have an default branch, and different fall-through behaviour.

switch(cond {,case(values,statement)} [,default(statement)]) /* C fall through convention */
select(<same syntax>) /* Pascal else-if like convention */

The <values> argument should be a list of explicit (constant) values, for single values the parentheses around the list may be omitted.

Loops

A general representation for loops should allow for the following attributes:
initialization, stepping, pre/post test.
position of the Continue label/part.
label (name) for directed Breaks.

Distinct loop types may be defined, to reflect e.g.:

counted loops
C style for loops
pre/post-tested loops (repeat/until)

Possibly both a pre- and post-test may be specified, each of which can be empty in endless loops.
for(init,pre-test,body,step/cont,post-test)

Labels and Goto

Labels and Gotos can be represented by:
label(name)
goto(label-expression)

Block Statements

Various kinds of block statements can occur:

compound statements, with/out local variables
try-except/finally
procedure definitions

Statement sequences and simple compound statements can share the same syntax: block(stmt{,stmt}).
Blocks with local variables will occur at least in procedure definitions.
A general Block structure can include a possibly empty list of local variables as the first argument: block(locals,stmts...).

The C setjmp/longjmp constructs should be represented by some try-except syntax, to reflect the irregular control flow for these constructs.

Procedure declarations should follow the Declaration syntax: (params)result-type. The parentheses should not be confused with list-delimiters, i.e. the declaration should be a positional argument of the procedure definition:
proc(name:params-and-type,compound-stmt)

Other Items

Constants, Variables, Initializers and Type Declarations already have been specified, what remains to specify are syntactical wrappers around all these items.

In all cases a distinction should be made between internal (local) and exported (public) declarations and definitions. This can be done in Delphi-like unit(...) blocks, with interface(...) and implementation(...) sections, as well as optional uses(...) and initialization(...) and finalization(...) sections, with type(...), const(...), var(...), proc(...) and inline(...) subsections. A distinction between Const and Var is not necessary, but IMO procedure declarations and definitions should be grouped into according blocks. Namespaces should be compact, i.e. a namespace (or scope) cannot be extended in other places or units. The Inline subsection is intended for C #defines, which can be interpreted as various higher level declarations (constants, inline procedures...).

Procedure definitions can be split into an procedure type, i.e. roughly the procedure declaration, and the specific implementation. Then the declaration part can be stored in the appropriate (public/local) scope, and the definition part only refers to that declaration, but does not repeat it entirely.

Conditional Compilation

Conditional compilation does not always fit together with a tree structure, when the various branches do not contain complete meta subtrees. In such situations the conditional branches should be completed with duplicates of all required items, and possibly more items until the common continuation point in the meta tree is reached. A parser then can select any (or none) applicable branch for interpretation. A cross compiler can determine the common tails of all branches, and move that common tail out of all branches into the main stream code.

Formatting and other Sugar

A cross compiler should retain some formatting information, about line breaks and comments. The placement of comments is not always trivial, when the control flow or declaration syntax of the target language is too different from the source language. Line breaks and indentation are less important and uncritical, the transformed code can be reformatted by some kind of pretty printer for the target language.

Transformations

For the use in an cross compiler it should be possible to define language specific statement and expression types, for both source and target languages. It should be possible to transform all source-language specific constructs into lower-level constructs, until either the consumer knows how to interpret a construct, or no transformation into the target language is possible at all. The transformation rules should be stored in their own general syntax.