Orth Compiler

Introduction
Command Line
Configurations
Stages of Compilation
Incremental Compilation
Entry Point

Introduction

Unlike C/C++, the basic unit of compilation is an entire directory. The compiler effectively concatenates each file whose ending is .orth in the input directory into a single input file. In Orth 0.3, all of a program's code must be in the input directory. Future versions of Orth will allow a source file to include all of the source code (i.e., files ending in .orth) from another directory.

The compiler maintains a program database for each executable inside the executable itself. The program database contains all of the information the compiler needs for incremental compilation and all of the information the debugger needs to set breakpoints, inspect variables, and roll back the call stack. The program database resides in two unmapped sections of the executable: DEBUG and INDEX. When you run the executable, the loader does not copy these sections into the program's virtual memory. The compiler and debugger access the program database by opening the executable and mapping it into their address space.

Command Line

compiler [-rebuild] <input directory> [<output path>]

The compiler reads each file whose ending is .orth in the input diretory and creates an executable at the specified location (if output path is specified) or at the default location. The specified output path (if present) should contain the executable's directory (relative to the input directory) and the executable's filename. The compiler selects a default location as follows. If the target architecture is x86 (i.e., 32 bits), the output directory is a subdirectory named "Debug" inside the input directory. If the target architecture is x64 (i.e., 64 bits), the output directory is a subdirectory named "Debug64" inside the input directory. The output filename is the input directory name followed by .exe (if the output is an application) or .dll (if the output if a library). For example, if the input directory is d:/mycode, the target architecture is x86, and the program is an application, the default path is d:/mycode/Debug/mycode.exe. If the output path (or any subdirectory within the output path) doesn't exist, then the compiler creates it.

The -rebuild option disables incremental compilation. The compiler generates new machine code for each function rather than reusing machine code from the old executable. Specifying -rebuild has the same effect as deleting the old executable.

Configurations

The compiler supports two architectures: x86 and x64. The compiler doesn't support cross compilation, meaning that the x86 compiler is only able to generate x86 executables and the x64 compiler is only able to generate x64 executables.

Orth 0.3 supports only debug builds. Future versions of Orth will provide a command-line switch for stripping debugging information from the executable and optimizing the machine code for faster execution using a third-party compiler backend (e.g., LLVM).

Stages of Compilation

There are four stages of compilation. Each stage must complete without errors in order for the next stage to begin.

Parsing:: The lexer generates a sequence of tokens and the parser builds a syntax tree.
Semantic Analysis:: The compiler searches the syntax tree for declarations and makes a symbol table for each scope.
Code Generation:: The compiler generates machine code and a relocation table for each function.
Linking:: The compiler patches each function's machine code with the actual address of functions it calls and shared variables it accesses. The compiler then writes the program image including machine code, program data, an import table, an export table (for DLLs), a rebasing table (for DLLs), and the program database to the output path.

Incremental Compilation

The compiler achieves fast compilation times by compiling only chunks which have changed since the last compilation. Each statement within the global scope and each statement within an aggregate belongs to its own chunk. Unlike C/C++, a chunk is the smallest unit of compilation.

Each chunk has zero or more child chunks, zero or more declarations, and one or more (not necessarily contiguous) tokens. The example below contains six chunks:

struct X
{
    int a
    int foo(int b)
    {
        return a+b
    }
    int bar(int c)
    {
        return foo(c)
    }
} x

ID	Children	Declarations	Tokens
1	2,3,5	`X`	`struct X { } x`
2	none	`a`	`int a`
3	4	`foo` and `b`	`int foo(int b) { }`
4	none	none	`return a+b`
5	6	`bar` and `c`	`int bar(int c) { }`
6	none	none	`return foo(c)`

A chunk can change either directly (because its tokens have changed) or indirectly (because it depends on another chunk which has changed either directly or indirectly). In this example, replacing int a with long a will invalidate chunk 2, chunk 1 (because the layout of X has changed), and chunk 4 (because the common type for the addition has changed). Chunks 3, 5, and 6 are unaffected. This example illustrates the importance of separating a function's declaration from its body. Changes to a function's body typically don't affect other functions which call it. This optimization drastically reduces the number of chunks the compiler needs to recompile.

Incrementable compilation can fail for several reasons:

the old executable is unavailable or inaccessible
the old executable doesn't contain a program database
the program database's version doesn't match the compiler's version
the program database is corrupt

If incrementable compilation fails the compiler does a full build instead.

Entry Point

Each Orth application must define exactly one global main() function with no arguments that returns void. Unlike C/C++, a program is responsible for obtaining its command-line arguments and environment using a system call and for returning control to the operating system using a system call. As such, Orth doesn't support argc, argv, envp, and an integer exit code. The compiler accepts any calling convention for main() (internal, cdecl, or stdcall) because the operating system never calls the main() function. Instead, the operating system initializes the IP register with the address of the first machine instruction in main(). The compiler inserts an INT3 instruction after the last statement in main() to ensure that the program exits properly. In all other respects, main() behaves like an ordinary function (you can call it, take its address, and export it, but you can't import it). A typical Orth application will look like this:

void main()
{
    Wchar^ cmdln:=GetCommandLine()
    ...
    ExitProcess(exitCode)
}

An Orth library must define exactly one global entry point with the following signature:

stdcall int DllMain(void^,uint,void^)