Different sets of scopes can be used in the input (source language parsing) and output (target language transformation) phases. Both phases can be separated, with intermediate meta files holding the information from the input phase.
TSymbol represents a possibly anonymous declaration or definition. Various
subclasses exist for special symbol kinds.
TScope is a container for symbols. The symbols are accessible by their
scope index or by name.
TModule represents an translation unit, whose scope holds the definitions
of that unit.
Definitions differ from declarations only in the presence of a symbol definition (.Def: string), holding a type definition, and an optional value (.StrVal: string). Procedure definitions have an additional .Body:TSymBlock, a TStrings with an added scope.
Other languages, like C, do not allow for such a simple structure. An attempt has to be made to separate the declarations in the C header files into general (global) declarations and into translation unit specific declarations. The introduction of namespaces, in C++, does not resolve this problem, it only produces more overhead in the scope and unit management.
In an first translation approach all non-static declarations in C header files go into an Globals scope, eventual definitions go into the Globals or Statics (module related) scope. Every symbol contains a file based location of it's declaration or (if present) definition. Non-static symbols are copied from the Globals scope into the module specific scope. This way multiple translation units can be parsed in sequence, updating the location information of the symbols in the Globals scope.
In general several scopes have to be considered:
External scope (global)
[Typedef scope (global)]
Qualified scopes (structured types)
Module scope
Static scope
Local scopes
Temporary parameter scopes at least are created for old style parameter lists, and are discarded after use.
In a first pass the translator populates the non-local scopes, with
all identifiers found in the declarations or meta file.
The external (imported) symbols are used to construct the Uses list(s)
of the unit, according to their file location. If desired, all global symbols
can be assigned to synthetic units before. The Imports scope must contain
at least all used external symbols, as detected by the scanner. All collissions
with keywords of the target language must be detected (DupeCount).
In the next pass all collissions between names must be detected. This
can be accomplished by a case-insensitive search for unit-specific names
in the unit scope itself, in the Imports scope, as well as in the reserved
words.
Name clashes can be handled in various ways:
Procedure bodies are parsed in a special mode (fMetaNames).
Parameter and Local scopes must contain original names for parser lookup,
are destroyed at the end of the body parse. All these scopes are local
to the body, in proc.Body.Scopes[].
A Params scope with mangled names can be retained in the procedure,
for body parsers?
Meta output of procedures contains mangled names for parameters and locals -> new parameter list, eventually created from the Params scope? A flat Locals scope with all mangled names also can be maintained? (simplifies Pascal output!).
Body parsers must not necessarily use scopes, local names can be recognized as mangled names, and can be handled appropriately, even without scope lists. Otherwise the dedicated Params and Locals scopes can be used for symbol name lookup. When these scopes are not stored explicitly, they must be reconstructed when reading meta files!
For target output dedicated scopes should be created. Non-local scopes contain only unmangled (original) names, the DupeCount of the symbols can be determined whenever a symbol is added to such a scope (by inspection of all non-local scopes!).
Local scopes occur in subroutines, and contain the mangled names (for parser lookup). The handling of local scopes and symbols must be compatible with the persistent Params scope and symbols. Dedicated Params and Locals scopes should be used, either created and retained from the original code parse, or reconstructed from reading procedure bodies from meta files.
A prescan can be required to create all local scopes (Params, Locals), and to add all used external symbols to the frame scope(s). Only then the DupeCount of the loal symbols can be determined, by a comparison of their unmangled names with the names in the frame scopes. A local Externals scope can be added, holding all external names as used in the procedure body. [Not required, imported symbols always must go into the appropriate frame scopes]
The Publics table can be created on the fly, i.e. from the non-local
symbols (with unmangled names), as found in the meta code of the modules.
A dummy unit (name???) can be used to hold all symbols with no assigned
unit.
Parameter names are visible both inside and outside a procedure. Therefore
they must not clash with imported and exported names, and also must be
unique within the same parameter list. Various decorations can be used,
depending on the actual DupeCount of the parameter symbol.
For a translation into Pascal or similarly structured languages, another set of scopes is required. The module scope can be used as-is, but another Imports scope collects all external symbols. In a first pass, the whole translation unit is scanned for imported symbols. Once these symbols are collected, the list of used modules can be constructed, from the file locations of all imported symbols. If desired, the file locations of all global symbols can be updated before, to reflect their desired locations in unit files.
Also all names can be checked for collissions with keywords or homonyms of equivalent spelling. Herefore the reserved words of the target language are used to initialize the Imports scope. Whenever a symbol is added to an output scope, the number of conflicting names in the list is determined, and stored as DupeCount in the symbol object; this value is used later in the creation of an unique (decorated) name for every symbol. The symbols in the module scope must be checked for dupes in an explicit dupecheck.
Symbols in qualified scopes also deserve an special dupecheck, but only against reserved words and against dupes in their own scope. [This check is omitted, for now, it requires tracking of qualified scopes in the translator as well]
Symbols in subroutines deserve special handling. Since no persistent scopes are stored for local symbols, neither parameters nor local variables, according temporary scopes are created while translating a procedure. A parameter scope is created from the parameter names, which must be used in every presentation of the subroutine header. Local symbols go into another temporary scope. A local scope [stack of?] must be maintained as well, for the lookup of all local symbols. Every encountered symbol (definition) is pushed onto the local scope stack, the DupeCount is determined, and the symbol also is added to the temporary subroutine scope, for the construction of the lists of local declarations (var, const, label). [A stack simplifies the implementation, later a single scope can be used, with local symbols popped off on exit of an block]
In a final pass the translated code is emitted, consisting of::
With case insensitive target languages the spelling of identifiers is ignored, so that multiple symbols of effectively the same name can exist.
In a full fledged version, distinct scopes must be used, for struct members, parameters and local variables. Members of these scopes only can collide with reserved names, and with names inside their own scope [iff case insensitivity is involved!]. Consequently all symbols must be checked individually, when the target language is known. To prevent excessive analysis, every symbol must reside in an appropriate scope, even struct and enum members. The parser should check for qualified names, i.e. follow "." and "->" references, to the appropriate symbol, and the qualified name of every symbol should be output into any textual representation.
It looks as a good idea to use prefixes, starting with a "$" for qualified
names, followed by a list of scope identifiers, separated again by "$"s:
ident ::= [ "$" { scope-id "$" } ] name.
where scope-id is any of:
globals: <none>
statics: <none> in file scope
enum-members: <none> - are all globals!
struct-members: <typeref>$name
parameters: <procref>["$0"]
locals:<procref>$<scope-number/list>
static-locals: <proc>$<scope-id>$<name>
It can be assumed that every visitor of the textual representation knows
about the current scope, at least about the current procedure for parameters
and local names. Care must be taken for structured types, where sub-scopes
can occur just as in procedures!
Eventually a shortcut ("$$") can be used to identify names outside
a local (proc) scope, and local symbols can be flagged with an scope identifier
only.
Then it's up to the visitor, to maintain possibly different target-scopes (common to structs and procedures), with appropriate nesting. Within every target-scope, every unqualified name must be unique, and must not collide with a reserved word. It also must not hide used symbols in outer scopes! [For simplicity, it can be assumed that no case-sensitive name hides an name from an outer scope.]
So weit - so gut ;-)
Next all struct members must be handled, but only when these conflict with reserved words. Otherwise unique member names are assumed. [C++: This will not hold for overloaded procedures, global or within classes].
Care must be taken to assign an DupeCount only once, to every symbol object, not whenever the list is modified later(?). The DupeCount depends on the target language, not on the source language!
In later steps all module (non-global) symbols are added, for the module to translate, and the local symbols for procedures to translate.
Local symbols need special considerations. Currently a procedure body is stored as a flat string list, with unrolled (but embedded) blocks. We can assume that only local symbols are defined within blocks, which can be removed on exit from an block. It must be assured that no external symbols prevent the removal of local symbols.
In a translation from C to Pascal, local symbols can be declared only on procedure level. Since variables in local blocks can have different types, a list of really all local variables is required. It were a good idea to have a flat list of all scopes, for the creation of such a list...
As a preliminary solution, all local variables are mangled into __<scope#>_<name>, so that they are all unique. The removal of the decoration can be handled later.