Last updated 130413


Orth is an object-oriented language and IDE for Windows on x86 and x64. "Orth" is the first four letters of "orthogonal", which means the features of a language can be used without thinking about how that usage will affect other features. For example, if int[2] a declares an array of 2 integers, and int[1] a declares an array of 1 integer, then int[0] a should declare an array of zero integers. The size of an array is equal to its element count multiplied by the size of its element type, so the size of int[0] is zero. Another example is parameter passing. An orthogonal language should use the same rule for passing a parameter of any type. If the function makes a copy of a structure parameter, then it should also make a copy of an array parameter.


When I started Orth in 2006, my primary focus was on features. I had a list of over 100 features that I needed at one time or another, and I wanted to make the ultimate language. As I implemented each feature, the compiler became more and more complicated. Pretty soon, I realized that I would rarely (if ever) use 90% of the code that I was writing. I would spent days searching for solutions to problems that would never arise in practice. In 2009, I scrapped the project and started over with a new set of goals. First, I wanted to get a complete IDE up and running as quickly as possible. Second, I wanted to divide my time proportionally between each part of the IDE. Third, I wanted to stay focused on important goals and save novelty features for later.

Incremental Compilation

One of my pet peeves is compilation speed. Small projects compile instantly in Visual Studio and gcc, but as the project becomes large (e.g., 100,000 lines or more), seemingly small changes take several seconds to compile and link. This may seem insignificant when a massive library like glibc takes 30 minutes to compile, but programming is all about momentum. When your hands are moving at lightning speed and your mind is racing, waiting several seconds for your program to compile seems like an eternity.

Incremental compilation is complicated, but it has many benefits. One benefit, of course, is faster compilation. Another benefit it automatic module imports. Maintaining a list of imports at the top of each module is a tedious chore that keeps you from doing real programming. Trying to put declarations in the proper order and writing forward declarations is another waste of time. A third benefit is that incremental compilation automatically maintains a "program database" containing the name, type, and scope of every entity in the program. The debugger can use access information in this database using the same code as the compiler.

In Orth, the smallest unit of compilation is a chunk, which can be a function, a variable, a typedef, a struct, etc. Changing a chunk causes the compiler to delete the old code and data for that chunk from the program database, recompile the chunk, and then insert the new code and data into the database. The compiler looks for other chunks that would be affected by the change and recompiles them also. Finally, the compiler "links" the code and data for each chunk and copies it into an executable.


The Orth compiler obscures the traditional distinction between compiler and linker. Rather than generating an "object file" for each module, the compiler remembers code, data, and relocations for each chunk in its program database. Each time a chunk accesses code or data in another chunk, the compiler adds an entry to the chunk's relocation table. Each entry contain two pieces of information: (1) the offset in the chunk's code or data that needs patching; and (2) the ID of the accessed chunk. Linking is very simple because the compiler doesn't need to "reconcile" symbol tables from multiple object files. Instead, it concatenates the code and data from each chunk, stores the address of each chunk in a table, and then patches each chunk's code and data according to its relocation table.


Orth doesn't support static libraries because linking to shared libraries is much easier. For instance, the compiler doesn't need to assimilate debugging information from a variety of sources into its program database. On Windows, Orth programs link directly to Windows DLLs (KERNEL32.DLL, USER32.DLL, etc.). Orth doesn't require so-called "import libraries" because it has all of the information it needs to correctly generate a call to a DLL function. Orth currently doesn't support Linux, but if Linux support is added, then Orth programs will make system calls using inline assembly. The C runtime library (CRT) is unnecessary on Windows and Linux because each has a well-documented API of its own.


Orth supports only x86 and x64. Since almost all personal computers use one of these two architectures, I see little reason to add support for other architectures. Eventually, Orth will have a "smart" inline assembler like Visual Studio. The objective is to freely mix assembly with Orth code and let the compiler worry about details like register allocation. Inline assembly is the simplest way of implementing low-level functions like memcpy(), memset() and controlfp(), and using 32-bit MUL and DIV instructions to their full potential. Of course, the compiler could provide a long list of intrinsics in every shape and form, but why go to all that trouble when you could just write a function in assembly language to do exactly what you want. Another objective is to write assembly code for x86 that will also assemble on x64 with no changes. For instance, you might want to move EAX to EBX on x86 and RAX to RBX on x64. Ideally, you should be able to express your intent using a single instruction.


The C preprocessor is useful. There's no denying it. The easiest way to accomplish some tasks is by doing token-by-token substitutions. Unfortunately, the C preprocessor is a wild beast that can wreak havoc on your program if left unchecked (just look at the GLIBC code). The trouble with the C preprocessor is that it occurs at the syntactic level rather than the semantic level. C macros have no scope, so they are able to change the meaning of any symbol in your entire program. C macros also make code navigation difficult because they can compose symbols using the token-pasting operator. Reading someone else's code and trying to find the definition of a symbol can be nearly impossible if you're not sure which macro declared the symbol. In Orth, the 'alias' statement provides much of the power of AST substitutions without the shortcomings of the C preprocessor. An alias declares a symbol in the current scope. Accessing the alias inserts the body of the alias into the AST and compiles it. If the access occurs through an object (e.g., myobject.myalias) then the compiler will look up each symbol of the alias body in the scope of the object. For example:

struct MyFlags
    uint flags
    alias foo:=(flags&1)!=0
    alias bar:=(flags&2)!=0
MyFlags mf
if( && //same as if((mf.flags&1)!=0 && (mf.flags&2)!=0)

IDE Integration

The compiler, debugger, and editor are tightly integrated components. The debugger needs to understand Orth constructs like aliases in order to properly evaluate the expression beneath the cursor. The editor needs to understand Orth indentation rules and line continuation rules in order to display and navigate the code properly. The debugger uses the compiler's program database to set breakpoints, inspect memory, and rollback the stack. The editor and compiler use a very similar pushdown automaton for parsing Orth files. In fact, I use a single program, grammargen, and a single grammar input file, grammar.txt, to generate the state-transition table for each one. Developing these three components in parallel is an important goal for this project. When considering a new language feature, I have to consider not only the added complexity in the compiler, but the difficulty in adding editor and debugger support. C/C++ preprocessor macros are an ideal example. Adding rudimentary preprocessor support to the Orth compiler would be a weekend project, but preprocessing makes incremental color coding and context tagging extremely difficult. Indeed, Visual Studio and Visual SlickEdit are still confused by some obscure preprocessor constructs like token pasting.


The IDE is completely isolated from other software and libraries. I use the most basic Win32 calls to interact with the operating system and do everything else myself. As a result, the Orth project doesn't inherit the defects, overhead, and complexity of the C Runtime Library (CRT), the C++ Standard Template Library (STL), and Windows frameworks such as .NET. When I encounter a bug, I'm able fix it correctly without introducing an ugly hack. When something doesn't make sense, I'm able to rewrite all of the affected code. Alas, my attempt to isolate myself from other people's coding mistakes fails when I reach the Windows API because it's poorly designed, poorly documented, and often broken, but luckily, I don't have to look at it very often.


Writing everything from scratch might seem like a wasted effort, but it has given me considerable insight into how software works and how software should work. Take multithreading for example. Conventional wisdom says that each thread should use a variety of mutual exclusion objects (mutexes) to guarantee that no two threads attempt to access the same data at once. In Orth, I take the opposite approach. Each thread runs exclusively until it explicitly yields control to another thread. In a conventional multithreaded program, forgetting a mutex is disastrous. The program will mysteriously crash once in a while due to an obscure race condition buried in hundreds of thousands of lines of code (perhaps code that you didn't even write). With my system, forgetting a call to yield() will make the program become unresponsive for several seconds, or in the worst case, become deadlocked. The problem is typically easy to find and easy to fix. For example, if a regular-expression search causes the editor to freeze, the solution is to insert a yield() callback inside the regular-expression engine.

Blocking system calls are challenging because you obviously can't call yield() inside the function (unless the function has a callback parameter). If you open a network file using CreateFile() and the function blocks for several seconds, the user should be able to cancel the operation by pressing escape. The solution is to schedule another thread to run just after making the system call and return from the system call only after that other thread is suspended. As long as we pass pointers to local variables to the system function, there is no chance of a race condition (i.e., two threads attempting to simulataneously access the same memory location). Aborting a thread is also straightforward. When the system call is about to return, we check whether the user has pressed escape. If so, we throw an exception. The exception unwinds the stack, deallocating memory, closing file handles, etc. The exception travels all the way up to the thread's main function where it kills the thread.

Another interesting aspect of multithreading is fibers. Suppose you write a thread to open a file using a call to CreateFile(). If CreateFile() fails, you'd like to display a message box describing why the operation failed. Naively, you would write something like this:


This code won't work, however, because a worker thread is not allowed to create windows. This is an unfortunate limitation of the Win32 API. The thread which creates a window is also responsible for servicing messages for that window. To my knowledge, a thread cannot create a window on behalf of another thread. The solution is fibers! A fiber contains two important pieces of information: an instruction pointer and the the call stack. Essentially, a fiber is a bookmark that allows you to suspend execution and return to that point in the code later. What's remarkable about fibers is that you can reschedule them on different threads. You can start a fiber running on thread A, perform some work like calling CreateFile(), reschedule the fiber on thread B, perform some work like displaying a message box, and maybe return to thread A to try calling CreateFile() a second time. The code above becomes:

    switchToWorkerThread(); //Each fiber initially runs in the main (GUI) thread
                            //This call suspends the fiber and creates a new thread
                            //whose main function resumes the fiber.
        switchToMainThread(); //Reschedule the fiber to run in the main (GUI) thread

This code isn't as elegant as the first example (you still have to remember to call switchToWorkerThread() and switchToMainThread()), but this system is far better than the conventional solution of sending a message from the worker thread to the GUI thread instructing it to display the message box.

My multithreading API has many features which are too numerous to describe here. The implementation uses a single mutex. When a thread starts, it locks the mutex. When a thread yields, it unlocks the mutex and then immediately attempts to lock it again. Unlocking the thread gives other threads an opportunity to run. The details of system calls are hidden inside a macro: BLOCKING_EXPR. This macro simply unlocks the mutex, calls the system function, saves the result in a temporary variable, locks the mutex, and returns the temporary variable. The BLOCKING_EXPR macro is, in turn, hidden inside my file API. For example, flOpenFile() contains BLOCKING_EXPR(CreateFile(...)). As a result, a thread can open a file and read data from it without worrying about starving other threads. There is one important caveat: two threads must not interact with a global object (like stdout) that uses the file API. Fortunately, such situations are rare in practice because my library seldom makes blocking system calls. When the library does make a blocking system calls (e.g., the FilePrinter), it's pretty clear that the thread needs to own a local copy of FilePrinter. Global objects that make nonblocking system calls are immuned to race conditions.

Ocassionally, a program wants to perform expensive computations on other processors as an optimization. Each worker processor performs the computation simultaneously while the main processor services the GUI. This situation calls for a nonmutual exclusion macro, BLOCKING_STMT, that gives the system permission to run the statements within the macro simultaneously with statements from other threads. Since the macro opens up the possibility of many race conditions occurring throughout my library, the statements within the macro require careful scrutiny. Each call to a library function must be searched for global-variable accesses. Fortunately, the number of global variables in my library is small. In particular, the code inside BLOCKING_STMT can safely allocate memory (the memory manager uses thread-local storage), open a file, write to a file, read from a file, and close the file.