Unlike C/C++, the basic unit of compilation is an entire directory. The compiler effectively concatenates
each file whose ending is .orth
in the input directory into a single input file. In Orth 0.3, all of a program's
code must be in the input directory. Future versions of Orth will allow a source file to include
all of the source code (i.e., files ending in .orth
) from another directory.
The compiler maintains a program database for each executable inside the executable itself. The program database contains all of the information the compiler needs for incremental compilation and all of the information the debugger needs to set breakpoints, inspect variables, and roll back the call stack. The program database resides in two unmapped sections of the executable: DEBUG and INDEX. When you run the executable, the loader does not copy these sections into the program's virtual memory. The compiler and debugger access the program database by opening the executable and mapping it into their address space.
compiler
[-rebuild
] <input directory> [<output path>]
The compiler reads each file whose ending is .orth
in the input diretory and
creates an executable at the specified location (if output path is specified) or at the default
location. The specified output path (if present) should contain the executable's directory (relative to the input directory)
and the executable's filename. The compiler selects a default location as follows.
If the target architecture is x86 (i.e., 32 bits), the output directory is a subdirectory named "Debug" inside the input directory.
If the target architecture is x64 (i.e., 64 bits), the output directory is a subdirectory named "Debug64" inside the input directory.
The output filename is the input directory name followed by .exe
(if the output is an application) or
.dll
(if the output if a library). For example, if the input directory is d:/mycode
, the
target architecture is x86, and the program is an application, the default path is d:/mycode/Debug/mycode.exe
.
If the output path (or any subdirectory within the output path) doesn't exist, then the compiler creates it.
The -rebuild
option disables incremental compilation. The compiler generates new machine code
for each function rather than reusing machine code from the old executable. Specifying -rebuild
has the
same effect as deleting the old executable.
The compiler supports two architectures: x86 and x64. The compiler doesn't support cross compilation, meaning that the x86 compiler is only able to generate x86 executables and the x64 compiler is only able to generate x64 executables.
Orth 0.3 supports only debug builds. Future versions of Orth will provide a command-line switch for stripping debugging information from the executable and optimizing the machine code for faster execution using a third-party compiler backend (e.g., LLVM).
There are four stages of compilation. Each stage must complete without errors in order for the next stage to begin.
The compiler achieves fast compilation times by compiling only chunks which have changed since the last compilation. Each statement within the global scope and each statement within an aggregate belongs to its own chunk. Unlike C/C++, a chunk is the smallest unit of compilation.
Each chunk has zero or more child chunks, zero or more declarations, and one or more (not necessarily contiguous) tokens. The example below contains six chunks:
struct X { int a int foo(int b) { return a+b } int bar(int c) { return foo(c) } } x |
ID | Children | Declarations | Tokens |
---|---|---|---|
1 | 2,3,5 | X | struct X { } x
|
2 | none | a | int a
|
3 | 4 | foo and b | int foo(int b) { }
|
4 | none | none | return a+b
|
5 | 6 | bar and c | int bar(int c) { }
|
6 | none | none | return foo(c)
|
A chunk can change either directly (because its tokens have changed) or indirectly (because it depends on another chunk which has
changed either directly or indirectly). In this example, replacing int a
with long a
will invalidate chunk 2, chunk 1 (because the
layout of X has changed), and chunk 4 (because the common type for the addition has changed). Chunks 3, 5, and 6 are unaffected. This
example illustrates the importance of separating a function's declaration from its body. Changes to a function's body typically don't
affect other functions which call it. This optimization drastically reduces the number of chunks the compiler needs to recompile.
Incrementable compilation can fail for several reasons:
If incrementable compilation fails the compiler does a full build instead.
Each Orth application must define exactly one global main()
function with no arguments that returns void
.
Unlike C/C++, a program is responsible for obtaining its command-line arguments and environment using a system call and
for returning control to the operating system using a system call. As such, Orth doesn't support argc
, argv
,
envp
, and an integer exit code. The compiler accepts any calling convention for main()
(internal
, cdecl
, or stdcall
)
because the operating system never calls the main()
function. Instead, the operating system initializes the
IP register with the address of the first machine instruction in main()
. The compiler inserts an INT3
instruction after the last statement in main()
to ensure that the program exits properly. In all other respects, main()
behaves like an ordinary function (you can call it, take its address, and export it, but you can't import it). A typical Orth application will look
like this:
void main() { Wchar^ cmdln:=GetCommandLine() ... ExitProcess(exitCode) } |
An Orth library must define exactly one global entry point with the following signature:
stdcall int DllMain(void^,uint,void^) |
The system calls DllMain()
using the stdcall
convention when it loads the DLL into memory.
The program need not export DllMain()
because the compiler records its address in the executable.
In all other respects, DllMain()
behaves like an ordinary function (you can call it, take its address, and export it,
but you can't import it).