Parse Tree

The output of an parser often is stored as a tree. This tree reflects not only the tokens, input from the scanner, but also the grammar rules (productions) used while parsing that input. This document describes a parse tree structure for programming languages, and an according textual streaming format. The C and Delphi (OPL) languages are the first application for this structure, and a C scanner shall be used to read streamed parse trees. Therefore the terms are taken from these languages.

Trees

A tree consists of nodes with none or more child nodes. Every node has an name, type, and possibly more attributes. The node names and types correspond to the rules and terminals of a programming language grammar. For simplicity the node types are taken from the token types of the scanner, with only few added tokens. The bare scanner only produces an subset of the defined token types, in detail all identifiers are returned as general t_sym tokens; these tokens can be mapped by a language specific parser into keyword tokens of the language. A parser for parse trees can map the identifiers into keyword tokens of all languages, including the parse tree specific extensions. Therefore it's vital that the tree parser can distinguish tree specific keywords from identifiers of the parsed source code. This means that the grammar for parse trees is context sensitive, and the source code identifiers are quoted whenever they might be confused with tree specific keywords.

The root node (goal) of an parse tree is a unit node, whose children are declarations and definitions. Declarations and definitions only differ in the absence or presence of declarative or initialization information, the tree grammar and structure covers both declarations and definitions in the same descriptive construct. Descriptions cover abstract (type...) definitions and concrete (code, data...) declarations. An application maintains lists for all defined identifiers, separated into abstract (types...) and concrete identifiers (variables, constants, procedures...), which are used to determine the exact type of every identifier. The tree parser uses another list of tree specific identifiers (keywords).

The general parse tree syntax is:

Node ::= Literal | Tree.
Literal ::= '"' String '"' | "'" Char "'" | Number | Identifier | Comment.
Tree ::= ANY "(" [ Nodelist ] ")".
Nodelist ::= Node { "," Node } [ "," ].

The last optional "," in Nodelist allows to interpret a comma as either an list separator or an node terminator.
The ANY token in Tree must not conflict with other syntactically valid tokens in this place.

Unit ::= { Declaration }.
Declaration ::= ["static" | "implementation" | "extern" | "interface" ]
    ("type" TypeDefs | ( "const" | "var" ) DataDefs | "label" Identifiers | [ "procedure" ] ProcDefs ) ";".
TypeDefs ::= ( TypeDef )*","
TypeDef ::= Identifier "=" TypeSpec.
  TypeSpec ::= { TypeMod } ( BaseType | StructType | EnumType | SetType | ProcType | '"' TypeId '"' ).
    TypeMod ::= ArrayMod | ConstMod | VolatileMod | PointerMod.
      ArrayMod ::= "[" Expression "]".
      ConstMod ::= "const" | "#".
      PointerMod ::= "*".
      VolatileMod ::= "volatile" | "V".

BaseType ::= [ "unsigned" | "+" | "signed" | "-" ] ( Digit | Letter ).
StructType ::= ( "struct" | "record" | "R" | "union" | "U" ) [ ":" TagName ] [ "{" { StructMember }*"," "}" ].
  StructMember ::= [ Identifier ] ":" TypeSpec [ ":" Expression ].
EnumType ::= ("enum" | "E") [ ":" TagName ] [ "{" { EnumMember }*"," "}" ].
  EnumMember ::= Identifier [ "=" Expression ].
SetType ::= ("set" | "S") [ ":" TagName ] [ "{" Ordinal [ ":" Ordinal ] "}" ].
ProcType ::= "(" { Parameter }*";" ")" { ProcAttr } ":" TypeSpec.
  Parameter ::= [ Identifier ] ":" [ ( TypeSpec | "..." | "~" ) ].
  ProcAttr ::= ("cdecl" | "C") | ("fastcall" | "F") | ("inline" | "I") | ("method" | "M").

DataDefs ::= { DataDef }*",".
DataDef ::= Identifier [ ":" TypeSpec ] [ "=" Initializer | "extern" ].
  Initializer ::= Expression | "{" { Initializer }*"," "}".

ProcDef ::= Identifier ProcType ( [ DataDefs ] Block | "extern" | "forward" ).
  Block ::= "{" [ Declarations ] { Statement } "}".
    Statement ::= TransferStmt | LoopStmt | CondStmt | Block | Assignment ";".
TransferStmt ::= "break" [";"] | "continue" [";"] | "exit" [";"] | "goto" Label [";"] | "return" [ Expression ] ";".
  /* "try", "raise"... */
LoopStmt ::= ("do" | "repeat") Statement [ "until" ] Expression ";" | "while" Expression [ "do" ] Statement
    | "for" DataDefs ";" /*to*/ Expression ";" /*step*/ Expressions ";" /*do*/ Statement.
CondStmt ::= "if" "(" Expression /*then*/ Statement [ /*else*/ Statement ] ")" | ( "switch" | "case" "(" Expression ";" /*of*/ { Case } ")".
  Case ::= CaseLabel { Statement }.
    CaseLabel := { Ordinal [ (":" | "..") Ordinal ] } | "default".
Assignment ::= AssignOp "(" Designator ("," | ":=") Expression ")".
Designator ::= ( Identifier | Call | ArraySelector | MemberSelector | Deref ) { ":" TypeSpec }.
  Call ::= "(" Designator [":"] Expressions ")".
  ArraySelector ::=  "[" Designator "," Expression "]".
  MemberSelector ::= Punctuator "(" Designator Identifier ")".
  Deref ::= "@" Designator.
  /*Cast ::= ":" TypeSpec Designator.*/
Expressions ::= { Expression }*",".
  Expression ::= ( Operator "(" Expressions ")" | Literal | Assignment ) [ ":" TypeSpec ] | Designator.