Orth Semantics

Introduction
Types
Declarations
Pragmas
- dll
- base_address
- lib
Attributes
Expressions
Statements
- Scope
- If
- Select
- While
- Do
- For
- Label
- Goto
- Break and Continue
- Return

Introduction

Unlike C/C++, the order of evaluation is well-defined in debug and release builds. Side effects always occur in the order that they appear in the file.

Orth does not have C++ temporaries. A temporary is an anonymous user-defined type (UDT) that the C++ compiler automatically creates in the middle of an expression. Unlike an r-value, a temporary has an address and it might have destructor that must execute at a precise time and in a precise order.

The compiler organizes the program's code and data into four sections. It puts the code for each function in the TEXT section, constants in the RDATA section, shared variables with initializers in the DATA section, and shared variables without initializers in the BSS section. The RDATA section is readonly, meaning that any attempt to modify a constant will generate an access violation. The BSS section doesn't occupy an space in the executable. The Windows loader initializes each byte of each variable in the BSS section to zero.

Types

Sizes and Alignments

Type	Size	Alignment
`void`	0	1
`typeof(null)`	0	1
`typeof(uninit)`	0	1
`bool`	1	1
`char`	1	1
`wchar`	2	2
`dchar`	4	4
`byte`	1	1
`ubyte`	1	1
`short`	2	2
`ushort`	2	2
`int`	4	4
`uint`	4	4
`long`	8	8
`ulong`	8	8
`single`	8	8
`double`	8	8
pointer	word²	word²
function	word²	word²

²Throughout this document, I use word to mean int on 32-bit machines and long on 64-bit machines.

Empty Types

An empty type, such as void, int[0], or struct{}, always has a valid nonnull address. Empty shared and local variables have a unique address. Empty member variables typically share the same address as another member. Copying an empty type is a NOP. For instance, the following code is legal but produces no assembly instructions. Returing an empty type leaves the function's result (EAX on x86) uninitialized.

void a
void b:=a
a:=b

Void Type

The main purpose of void is to declare a function that doesn't return a value. void is also useful as a placeholder for irrelevant or meaningless data. For instance, if we had a map template with two parameters, First and Second, we could define a set of integers by using int for First and void for Second.

Null Type

The null keyword replaces NULL in C and 0 in C++. null is a constant whose type is typeof(null). Since typeof(null) is awkward to type, you'll probably want to give it a shorter name like Null using a typedef.

null converts to any pointer or function type. The conversion yields a pointer with all bits cleared. null is also a convenient way of clearing user-defined types. For instance, we could define a structure whose constructor took a single typeof(null) argument. We could then clear the structure using the same syntax as that for clearing a pointer.

Uninitialized Type

uninit is a constant value whose type is typeof(uninit). Since typeof(uninit) is awkward to type, you'll probably want to give it a short name like Uninit using a typedef.

uninit converts to any type. The compiler initializes small variables (like int) with the bit pattern 0y110011001100... in debug builds and leaves large variables (like arrays) uninitialized. The keyword is useful for circumventing a compiler warning about an uninitialized variable. For example:

int x:=uninit
if(foo())
    x:=bar()
...
if(foo())
    baz(x) //Without the uninit initializer, the compiler might complain that x is
           //uninitialized here

Future versions of Orth will require an initializer for each member of a structure. uninit is a convenient way of emphasizing that an array initially contains garbage:

struct X
{
    int a:=0
    int[100] b //warning: b has no initializer
    int[100] c:=uninit //ok, c initially contains garbage
}

Finally, a user-defined dynamic array might want to provide an optimized insertion function that leaves the new elements uninitialized:

struct ArrayOfInts
{
    void appendn(int n, int value) {...} //append n copies of value
    void appendn(int n, typeof(uninit)) {...} //reserve space for n integers but don't
                                              //initialize them
}

Character Types

Orth defines three character types: char, wchar, and dchar. A char represents a single ASCII character in the range [0x00,0x7F]³ or a single byte of a UTF8 string. A wchar represents a Unicode character in the range [0x0000,0xFFFF] or a 16-bit unit of a UTF16 string. A dchar represents any Unicode character.

A character literal is a constant dchar. Converting a constant dchar to a char generates a warning if the value is outside the range [0x00,0xFF], and converting a constant dchar to a wchar generates a warning if the value is outside the range [0x00,0xFFFF]. char and wchar are unsigned whereas dchar is signed.

³A char can also represent an extended character (0x80 through 0xFF) in the current locale's code page. (typically Windows 1252 for English speakers). Since I hate working with locales, I always use Unicode for non-ASCII characters.

Floating-Point Types

single is a single-precision floating-point value that is equivalent to float in C/C++. double is a double-precision floating-point value. Throughout this document, float refers to any floating-point type (single precision or double precision).

Pointer Types

A pointer type uses the syntax type^. The type in front of the carot is called the base type. Orth uses a carot in place of an asterisk to make the syntax easier to parse. The base type can be any type including void, typeof(null), and another pointer type.

Function Types

A function type uses the following syntax:

result cdecl( param0, param1, ... )
result stdcall( param0, param1, ... )
result internal( param0, param1, ... )

The first two types are external function types. The third type is an internal function type. Orth uses an external calling convention for calls to functions in shared libraries (cdecl or stdcall) and an internal calling convention for calls to Orth functions (simply internal). The specifics of the internal convention (argument order, argument size, caller/callee stack cleanup, etc.) are nonstandard and subject to change. On x64, cdecl and stdcall are equivalent because x64 has a single calling convention. Orth does not support the fastcall and thiscall convention.

Unlike C/C++, a function is a pointer so an asterisk (C/C++ syntax) or carot (Orth syntax) is unnecessary. Similarly, taking the address of a function is redundant (and illegal in Orth) because accessing a function gives you a pointer r-value.

Similar to C/C++, each parameter can be either a type or a variable declaration. In the latter case, the compiler ignores the variable's symbol.

Of the three types above, only the first two are valid Orth code. The third is an internal compiler type that you'll see only in conversion errors and such.

int cdecl(int) f //ok
int cdecl(int) f //ok
int internal(int) f //error: invalid type

By default, each function declaration uses the internal calling convention. You can change a function's calling convention using the cdecl and stdcall attributes. Exporting a function or declaring a function import automatically uses the cdecl convention.

int foo(int x) //foo uses internal calling convention
        return x+1
cdecl int bar(int x) //bar uses cdecl calling convention
        return x+1
int cdecl(int y) f:=foo //invalid: cannot convert 'int internal(int)' to 'int cdecl(int)'
f:=bar //ok, f is now an alias for bar
       //the function type's parameter names need not match bar's parameter names

Array Types

An array type uses the syntax type[count]. The type in front of the bracket is called the element type.

An array can have zero elements, but be careful with the struct hack. The struct hack is where you put a zero-sized array at the end of a structure as a placeholder for an array of unknown size. Since Orth automatically rearranges structure members, either refactor the code or use cdecl struct to force the compile the layout the structure using the C/C++ ABI.

An array can have a maximum of 1,048,576 (2²⁰) elements mainly just as a sanity check. If you try to create an array with 0x7FFF_FFFF elements, then your code probably contains a mistake.

Declarations

Scopes

The scopes in an Orth program form a tree. There is a single root: the global scope. Each source file contributes declarations to the same global scope. Each aggregate declaration creates one subscope and each function declaration creates two nested subscopes: one for the parameters and one for the body. For example, the program below has ten scopes:

struct S
{
        int a
        struct T
                shared int b
        int foo(int c)
        {
                int d:=c+1
                return d
        }
}
int bar(int e)
{
        label1::
        if((int f:=e*2)<10)
        {
                label2::
                int g:=f+1
                return f
        }
        else
        {
                int g:=f-1
                return f
        }
}
int h

These ten scopes form the following tree: (Parenthesized symbols indicate the scope of each declaration.)

global scope (S, bar, h)
- S (a, T, foo)
  - T (b)
  - parameters of foo (c)
    - body of foo (d)
- parameters of bar (e)
  - body of bar (label1 and label2)
    - outer scope of if statement (f)
      - inner scope of if body (g)
      - inner scope of else body (g)

Even though b is shared, it sill belongs to the S.T scope. e, f, and g have the same lifetime (they are each part of bar's stack frame) but they belong to different scopes. A variable's scope determines visibility and not storage. A label always belongs to the body of the innermost function that contains it.

Shadowing

By default, shadowing causes a warning. A declaration shadows another declaration if (1) the declarations declare the same symbol; and (2) the shadowed declaration is visible from the shadowing declaration's scope (i.e., looking up the symbol in the shadowing declaration's scope succeeds). Shadowing causes subtle errors because declaring a new symbol can silently change the meaning of existing code. Ocassionally, shadowing is useful, so Orth lets you ignore the shadowing warning using the shadow attribute. The shadow attribute allows a declaration to shadow a declaration in a different scope. A declaration can never shadow a declaration in the same scope. The compiler doesn't allow function parameters to shadow.

void foo(int i)
{
    int i //warning: i shadows a parameter
    shadow int i //ok
    if((int i:=bar())>0) //warning: i shadows a local variable
    {
        int i //warning: i shadows a local variable
        int i //error: this scope already contains i
        shadow int i //error: this scope already contains i
        int j
    }
    else
        int j //ok, first j is invisible from this scope
}

Lookups

Symbol lookup is simple in Orth 0.3. The compiler searches for the first enclosing scope that declares a symbol beginning in the scope of the lookup and ending in the global scope.

Anonymous Declarations

You can make a declaration anonymous by using anon in place of the declaration's symbol (or omitting the symbol of a struct with braces). An anonymous declaration will not conflict with any other anonymous declarations. Accessing an anonymous declaration is impossible. anon is useful for declaring RAII objects whose only purpose is to initialize something in their constructor and finalize something in their destructor. anon is useless for a typedef, so the compiler doesn't allow it.

Shared Declarations

The shared keyword replaces the C/C++ static keyword. Every thread in the process shares the value of a shared variable. Every declaration in the global scope and every constant is automatically shared. [In Orth 0.3, every function is automatically shared also.]

Unlike C++, shared variables cannot have constructors or destructors. Additionally, their initializers must be constant. This limitation simplifies the language enormously while retaining most of the benefits of object-oriented programming. For example, Orth avoids the C++ initialization-order problem where one global variable accesses a global variable in a different module before it's initialized. Orth also makes it easier to walk the heap and report memory leaks. Since Orth doesn't have global finalizers that release heap memory, a program doesn't need to force each global finalizer to execute prior to running a leak test.

Constant Declarations

The const keyword indicates a value that is known at compile time or link time. For instance, the size of a pointer is known at compile time and the address of a shared variable is known at link time. Each constant must have an initializer.

Unlike C/C++, a constant is automatically shared, so writing shared const is redundant. Orth doesn't support "readonly" variables. In other words, you can't instantiate a local variable, initialize it, and make it readonly for the remainder of its lifetime.

Accessing a constant from any point in the code automatically evaluates and substitutes the constant's value. The constant's value may, in turn, depend on other constants whose value is then substituted into the expression. The process continues until either (1) the compiler is able to determine the value of all constants in the chain; or (2) the compiler detects a dependency cycle. In the latter case, the compiler issues an error and "gives up" on every constant and expression involved in the dependency cycle. The example below contains four constants with an illegal dependency cycle involving c, e, and f.

int foo()
        return c //to evaluate c, the compiler follows the steps below
const int c:=d+e
const int f:=c
const ind d:=1
const ind e:=f

Step 1: c is a constant, so the compiler substitutes its value, d+e
Step 2: d is a constant whose value is 1, so the expression becomes 1+e
Step 3: e is a constant whose value is f, so the expression becomes 1+f
Step 4: f is a constant whose value is c, so the expression becomes 1+c Since the compiler is already trying to determine the value of c in step 1, it issues an error and gives up on c, d, e, and f. It will subsequently ignore any expressions that make use of these constants.

Imports

Orth imports let you call functions and access data in DLLs. Calling an imported function (or accessing an imported variable) is a three-step process.

The first step is to write a pragma(lib) directive containing the path of the shared library you want to import such as pragma(lib,"kernel32.dll")

The scope of the pragma directive doesn't matter. The library path is case insensitive and can use either forward slashes or backslashes. The library path can be an absolute path or a relative path. If the path is relative, then the compiler uses the following search order to locate the library:
1. Each directory specified in a global pragma(lib_path) directive. Each lib_path can be either absolute or relative to input directory (i.e., the first argument to the compiler).
2. the Windows system directory
3. the Windows directory
4. Each directory in the PATH environment variable
A library path consists of an optional location (e.g., the part preceding the final forward or backslash) and a required name. If the location isn't present, then the location defaults to ".". The required name must exactly match the filename on disk (i.e., the compiler doesn't automatically add the .dll extension). If the program contains multiple pragma(lib) directives with the same name, then the location of each must match notwithstanding case and forward/backslash conversions. For instance:
pragma("./kernel32.dll") pragma("KERNEL32.DLL") //ok, location is "." in both cases pragma("C:/WINDOWS/SYSTEM32/KERNEL32.DLL") //error: "C:/WINDOWS/SYSTEM32" conflicts with "."
The second step is to write a function or variable import as follows:

import convention type function_name(param0, param1, ...)
import convention type variable_name

The convention is either cdecl or stdcall. If you omit the convention, then the convention defaults to cdecl. x64 has a single calling convention, so the choice between cdecl and stdcall is arbitrary.

Notice that the function import lacks a body. The compiler distinguishes function imports from variable imports by looking for a parenthesized argument list after the name. A variable import cannot have an initializer. Each function parameter can be either a type or a variable declaration. In the latter case, the compiler ignores the variable's name.

The import declares a symbol in the current scope. For instance, you could organize all of the KERNEL32 imports in a structure named Kernel32. Imports are automatically shared, so you could call an import named foo using Kernel32.foo. An import's name must be an identifier containing only ASCII characters (_, $, a-z, A-Z, and 0-9).
The third step is to call the imported function (or access the imported variable) exactly as you would any other function.

The compiler uses the following algorithm to locate each import.

First, the compiler searches the code for pragma(lib) directives and makes a list of available libraries. It ignores scope and merges pragma(lib) directives that refer to the same library using the library's name (the part after the final front or backslash). For example, foo.dll, ../FOO.DLL, and c:\bar\Foo.Dll all refer to the same library because the name of each one (converted to lowercase) is foo.dll. If a program contained a pragma(lib) for any two of these paths, the compiler would print an error because it wouldn't know which string to insert into the executable's import table (even if each path resolved to the same file on disk). Next, the compiler opens each available library, extracts the exported symbols, and makes a table of available symbols. Third, it scans the code for import declarations and looks up each symbol in the table. If there are more than one occurrence in the table, the import is ambiguous and the compiler issues an error. If there are no occurrences, then the import is undefined, and the compiler issues an error. Otherwise, the compiler remembers which library contains the import.

For cdecl imports, the compiler looks up the name exactly as you typed it. For stdcall imports, the compiler adds an underscore to the front of the symbol and an at sign followed by the total number of parameter bytes to the end (e.g., _foo@4). Unfortunately, Windows DLLs such as KERNEL32.DLL don't decorate their exports (even though they are stdcall) so the compiler is unable to match them properly. To remedy this problem, using the pragma(undecorated) attribute forces the compiler to look up a stdcall import exactly as you typed it.

Assuming that no errors occurred, the compiler inserts an import table into the executable or DLL containing the (possibly decorated) name of each import and the library which contains it. At run time, the Windows loader resolves the address of each import in this table.

Exports

You can export any cdecl or stdcall function and any shared variable using the export attribute. Exports are useful only when you're building a DLL. The compiler inserts an export table in the DLL containing the decorated name and address of each exported declaration. For cdecl functions, the decorated name in the table matches the declaration's symbol. For stdcall functions, the decorated name has a leading underscore and a trailing at sign followed by the number of parameter bytes (e.g., _foo@4). Each export must have a unique decorated name (but not necessarily a unique name). An export's name must be an identifier containing only ASCII characters (_, $, a-z, A-Z, and 0-9).

Aggregates

Orth 0.3 supports only one type of aggregate: a struct. We refer to the nonshared variable declarations in an aggregate as member variables. Unlike shared variables, member variables don't occupy memory in the program's image. Instead, a member variable is simply an offset from the beginning of the aggregate. Orth 0.3 doesn't support initializers for member variables. Later versions of Orth will support constant member initializers. By default, the compiler is free to rearrange the member variables for the optimal fit. Occassionally, a program needs to transfer a structure to or from a shared library. For these situations, the cdecl and stdcall attributes force the compiler to lay out the structure using the C/C++ ABI. cdecl and stdcall are equivalent. For example:

cdecl struct Foo
{
        byte a
        int b
        byte c //cdecl forces the compiler to place c after b
}
import cdecl void bar(Foo^)

Since a Foo^ is an argument to an imported function, bar, the compiler emits a warning if you forget the cdecl attribute in front of the struct.

The compiler uses the following procedure to lay out a structure:

The compiler determines the size and alignment of each member. By default, a member's size and alignment matches the size and alignment of its type. You can override the alignment using the alignas() attribute. alignas() has a single argument: a constant integer (1, 2, 4, or 8) or a type. The compiler uses the specified integer or the specified type's alignment in place of the default alignment. The new alignment can be less than or greater than the old alignment.
If the structure is internal (i.e., it lacks the cdecl and stdcall attributes), then the compiler sorts the members in ascending alignment order. Members with the same alignment remain in the same relative order.
The compiler calculates the aggregate's alignment. By default, the aggregate's alignment is the maximum alignment of any member. You can increase or decrease this default using the alignas() attribute in front of the aggregate keyword. As before, alignas() has a single argument: a constant integer (1, 2, 4, or 8) or a type.
The compiler clamps the alignment of each member between 1 and the aggregate's alignment. In practice this means you can remove all of a structure's padding using alignas(1).
The compiler initializes a counter named currentOffset to zero.
For each member:
1. If the member's size is zero, then the member's offset is zero.
2. Otherwise, the compiler rounds up currentOffset to a multiple of the member's alignment. This value becomes the member's offset. Then, the compiler increases currentOffset by the member's size.
The compiler rounds up currentOffset ot a multiple of the structure's alignment. This value becomes the structure's size.

An aggregate can contain only declarations and pragmas.

Typedefs

A typedef is like a constant in many respects. Accessing a typedef automatically evaluates its type, which can recursively depend on other typedefs, structures, constants, etc. The compiler issues an error for infinite recursion. The compiler evaluates only the type and not the size or alignment of that type. For example:

struct S
{
        S a //illegal: size of S depends on size of a, which depends on size of S
        S^ b //ok: size of S depends on size of b, which is unrelated to the size of S
        T^ t //evaluate T, which is an S, but don't evaluate the size of S because
             //we don't know it yet!
        typedef T:=S
}

Function Parameters

A parameter declares a local variable in the function's scope. Each parameter must declare exactly only variable (e.g., int x,y would be ambiguous because y looks like a second parameter). A parameter without a name is anonymous (e.g., int by itself is synonymous with int anon). Future versions of Orth will support default initializers and keyword arguments.

Unlike C/C++, the compiler passes arrays by value.

Compound Initializers

A compound initializer initializes each element of an array or each member of an aggregate. Orth allows you to reorder the initializers using designators. An array designator is an constant integer, a constant closed/clopen range of integers, or two periods (..) followed by a colon. An aggregate designator is the name of an aggregate member followed by a colon.

struct X { int a; int[10] b}
X x:={b:{1..3:1,0:0,..:2},a:1}

Initializing an element/member or member more than once is illegal. Unlike C/C++, neglecting to initialize an element/member causes an error. An initializer without a designator initializes the element/member after the last element/member initialized by the previous initializer (if there is one) or causes an error (if there isn't). The default initializer (..) initializes the remaining uninitialized elements of an array (if any).

Side effects occur in the same order as their position in the file. The compiler evaluates the default initializer (if present) exactly once in all cases.

Pragmas

Like C/C++, a pragma() directive is a practical consideration that is often platform-specific or compiler-specific. There are two forms of pragmas: statements and attributes. A statement pragma appears on a separate line anywhere in a source file. Its scope isn't important. The compiler silently merges redundant pragmas and generates a warning for incompatible pragmas (e.g., specifying base_address as 0x10000000 in one file and base_address as 0x20000000 is another file). An attribute pragma appears in front of an expression (typically a declaration) and affects the compiler's output for that expression.

Orth 0.3 supports the following pragma() directives:

dll Statement

pragma(dll)

By default, the compiler creates an application. This directive instructs the compiler to generate a library instead.

base_address Statement

pragma(base_address, integer literal )

The program occupies a contiguous block of virtual memory starting at the image's base address. You can explicitly set the base address using this directive. The second argument must be an integer literal. pragma(base_address) is useful for resolving an address conflict between two libraries. If two libraries overlap (i.e., they attempt to use the same region of virtual memory), the system must rebase one of them to a different address. Rebasing slows down program startup.

lib Statement

pragma(lib, string literal )

The pragma(lib) directive tells the compiler to search the specified library for imports. The second argument must be a string literal. See Imports for more information.

lib_path Statement

pragma(lib_path, string literal )

The directive must appear in the global scope. By default, the compiler searches the Windows directory, the Windows system directory, and each directory in the PATH for libraries. pragma(lib_path) tells the compiler to search the specified path before the paths above. The path can be relative or absolute. If the former, then it is relative to the input directory (not the location of the file containing the directive). The compiler merges paths that match (notwithstanding case and forward/backslash conversions) and puts them in an arbitrary order which doesn't necessarily match their order in the file.

undecorated Attribute

pragma(undecorated) declaration

This directive appears in front of an imported stdcall function. It instructs the compiler to look up the import exactly as you typed it rather than decorating it with a leading underscore, trailing at sign, and trailing parameter byte count. See Imports for more information.

Attributes

There are two forms of attributes. Checked attributes appear in front of an expression (typically a declaration) and affect the meaning of that expression. The compiler prints a warning if the attribute has no effect or if the attribute is repeated. Unchecked attributes appear above a statement (or a group of statements surrounded by braces). The compiler prints a warning if the unchecked attribute is repeated but silently ignores an unchecked attribute that has no effect on any statement beneath it.

const export //two unchecked attributes
{
    const int x:=123 //warning: const is repeated
    void foo() {} //ok: the compiler silently ignores const
    alignas(4) void foo() {} //warning: alignas() has no effect
    const alignas(4) //warning: const is repeated
        void bar() {} //ok: the compiler silently ignores const and alignas()
}

Orth 0.3 supports the following attributes:

shared (see Shared Declarations)
const (see Constant Declarations)
import (see Imports)
export (see Exports)
cdecl (see Imports, Exports, and Aggregates)
stdcall (see Imports, Exports, and Aggregates)
alignas (see Aggregates)
shadow (see Shadowing)

Expressions

Each expression must appear inside a function's body or the body of a statement. Aggregates and the global context cannot contain expressions.

Typeless Integers

Orth infers the type of an integer literal from its context making C/C++ integer suffixes unncessary. When the compiler sees an integer literal, it creates a placeholder for it called a typeless integer. Converting the typeless integer to a specific type (e.g., initializing long l with 0x1234_1234_1234_1234) replaces the typeless integer with a typed value. Operations involving only typeless integers yield another typeless integer (e.g., 1+2 yields 3) or a constant bool (e.g., 1<2 yields true). Operations involving a combination of typeless integers and typed values yield typed values in some cases (e.g., 1+x) and typeless expressions in other cases (e.g., 1<<x and b?2:3). A typeless expression can contain an arbitrarily large number of operations (e.g., a?1:b?2:c?3:4). The type of each integer literal in the typeless expression is unknown until the type of the outermost typeless expression is known (e.g., converting a?1:b?2:c?3:4 to a uint converts 1, 2, 3, and 4 to uint).

There are 16 operations that can produce typeless expressions:

- typeless
~ typeless
typeless + typeless
typeless - typeless
typeless * typeless
typeless / typeless
typeless << int
typeless >> int
typeless % typeless
typeless & typeless
typeless @ typeless
typeless | typeless
nontype , typeless
bool ? typeless : typeless
unreachable

For 9 of these operations, then compiler is able to infer the type of the typeless expression from the other operand:

typeless + T → T
T + typeless → T
typeless - T → T
T - typeless → T
typeless * T → T
T * typeless → T
typeless / T → T
T / typeless → T
typeless % T → T
T % typeless → T
typeless & T → T
T & typeless → T
typeless @ T → T
T @ typeless → T
typeless | T → T
T | typeless → T
bool ? typeless : T → T
bool ? T : typeless → T

Type inference is not possible for shifting (<< and >>) and side effects (,). For example, the type of 1<<(int i:=foo()) and (int i:=foo()),1 are unknown without examining the expression that contains them.

Each typeless integer is able to convert itself to an integer or float type. Each typeless unary operation (- and ~) and binary operation (+, -, *, <<, >>, /, %, &, @, and |) is able to convert itself to any integer type. The comma operator (e.g., (foo(),1)) is able to convert itself to type T if the second operand is able to convert itself to type T. The conditional operator (e.g., b?2:3) is able to convert itself to type T if the second and third operands are able to convert themselves to type T. The unreachable assertion is able to convert itself to any type because a working program will never reach the unreachable assertion.

If a typeless integer or typeless expression isn't able to convert itself to a type, then it converts itself to int. For example, consider the expression single s:=b?1<<x:2. If we naively infer the type of 1 and 2 to be single, then it's unclear how we shift 1.0 by x bits. Instead, we infer 1 to be int. Since 1<<x is then an int, 2 must also be an int. Some operators work equally well with floating point, but we nonetheless evaluate them using integer arithmetic for consistency with the shift operators. For example, we could evaluate single s:=(foo(),1)+2 by converting 1 and 2 to a single and using floating-point addition, but the presence of an arithmetic operator dictates that we convert 1 and 2 to int instead.

In rare instances, the compiler is unable to infer the type of a typeless expression from its context. There are 5 such instances:

the left operand of the comma operator
the initializer of a for statement
the increment of a for statement
a typeless expression by itself as a separate statement
a comparison between two typeless expressions (or a combination of a typeless integer and typeless expression)

In each instance, the compiler converts each typeless integer or expression to int.

Conversions

Orth 0.3 supports the following conversions:

Typeless integer to any integer

The compiler issues an error if the conversion loses significant digits. If the conversion type is an N-bit signed type, then the value must fit within the signed N-bit range (i.e., bits N-1 and above must all be set or cleared). If the conversion type is an N-bit unsigned type, then the value must fit within the signed N+1-bit range (i.e., bits N and above must all be set or cleared).

`byte`	[-0x80,0x7F]
`ubyte`, `char`	[-0x100,0xFF]
`short`	[-0x8000,0x7FFF]
`ushort`, `wchar`	[-0x10000,0xFFFF]
`int`	[-0x8000_0000,0x7FFF_FFFF]
`uint`	[-0x1_0000_0000,0xFFFF_FFFF]
`long`	[-0x8000_0000_0000_0000,0x7FFF_FFFF_FFFF_FFFF]
`ulong`	all

These rules are necessary because the compiler complements a typeless integer by negating it and subtracting one. For example, ~1 is the same as -2 (both have the bit pattern ...111111110). Since we want to be able to initialize an N-bit unsigned type with any complemented N-bit value, the compiler must accept any value in the range [-2^N,-1] in addition to any value in the range [0,2^N-1].

Typeless integer to any float
An overflow is impossible. The rounding mode is round to nearest if the conversion isn't exact.
Typeless integer to any type T provided that the expression's operands are convertible to type T
unreachable to any type
The conversion generates an INT3 instruction that breaks into the debugger.
single to double
double to single
The compiler generates a warning if converting a constant double to single would overflow (i.e., yield +INF or -INF). The rounding mode is round to nearest if the conversion isn't exact.
integer TO float
The rounding mode is round to nearest if the conversion isn't exact.
float TO integer
The compiler generates a warning if converting a constant float to an integer would overflow. The conversion truncates the value towards zero.
integer to larger integer
The conversion uses zero extension when the original type is unsigned and sign extension when the original type is signed.
integer to integer of same size
The conversion cannot overflow. The compiler reinterprets the bits as a signed or unsigned value.

integer to smaller integer

The conversion truncates significant bits. The compiler prints a warning when a constant conversion overflows. A conversion from an N-bit integer to an M-bit integer causes an overflow if the top N-M bits don't match each other. In addition, a conversion from a signed integer to a smaller signed integer also overflows if the value's sign changes. The following conversions from 16 bits to 8 bits are possible:


ushort(0x0000) → ubyte(0x00) 

...                                 

ushort(0x00ff) → ubyte(0xff) 

ushort(0xff00) → ubyte(0x00) 

...                                 

ushort(0xffff) → ubyte(0xff)


short(0x0000) → ubyte(0x00)  

...                                 

short(0x00ff) → ubyte(0xff)  

short(-0x0100) → ubyte(0x00) 

...                                 

short(-0x0001) → ubyte(0xff)


ushort(0x0000) → byte(0x00)  

...                                 

ushort(0x007F) → byte(0x7f)  

ushort(0x0080) → byte(-0x80) 

...                                 

ushort(0x00FF) → byte(-0x01) 

ushort(0xFF00) → byte(0x00)  

...                                 

ushort(0xFF7F) → byte(0x7F)  

ushort(0xFF80) → byte(-0x80) 

...                                 

ushort(0xFFFF) → byte(-0x01)


short(0x0000) → byte(0x00)   

...                                 

short(0x007f) → byte(0x7f)   

short(-0x0080) → byte(-0x80) 

...                                 

short(-0x0001) → byte(-0x01)

pointer or function to void^
The underlying bits are unaffected.
typeof(null) to pointer or function
The resulting pointer or function has all bits cleared. See null.

The result is undefined because the program should never access it. In practice, the compiler initializes primitives with the repeating bit pattern 0xCCCC... and leaves composites uninitialized (for the sake of efficiency). See uninit.

Unary Promotions

Some unary operators "promote" their operand according to this table. The promotion rules are the same as C/C++.

double or single → double
ulong → ulong
long → long
uint → uint
int, dchar, short, ushort, wchar, byte, ubyte, or char → int

Binary Promotions

Some binary operators "promote" their operands according to this table. The promotion rules are the same as C/C++.

If either number is a double, the common type is double.
If either number is a single, the common type is single.
If either number is a ulong, the common type is ulong.
If either number is a long, the common type is long.
If either number is a uint, the common type is uint.
Otherwise, the common type is int.

Common Types

The conditional operator finds a common type for its second and third operand. Future versions of Orth may feature templates that automatically find a common type for two or more parameters.

If each term has the same type, then that type is the common type.
If each term is a number, then promote each term using binary promotions.
Otherwise, print an error.

Accesses

symbol
ArrayType.count
AggregateType.symbol
array.address
array.count
aggregate.symbol
pointerToArray.address
pointerToArray.count
pointerToAggregate.symbol

In general, an access is a reference to a declaration in the same scope, an enclosing scope, or a specific aggregate's scope. If there is no period, then the compiler looks up the symbol starting from the current scope. Otherwise, the compiler evaluates the expression on the left-hand side of the period and searches for the symbol in the aggregate's scope (if the expression evaluates to an aggregate type or aggregate instance) or evaluates the array's count or address (if the expression evaluates to an array type or array instance).

If the expression on the left-hand side of the period is a pointer (but not a pointer type), then the compiler dereferences it repeatedly until it is no longer a pointer.

An access typically yields the value of a variable or the address of a function. Specifically:

If the access refers to a local variable, then the access fetches the variable from the stack.
If the access refers to a shared variable, then the access fetches the variable from the program's image.
If the access refers to a function, then the access produces the function's address.
If the access refers to a type, then the type replaces the access (including everything to the left of the period)
If the access refers to a label, then the access is illegal.
If the acesss refers to a member variable, then:
1. If there is a value on the left-hand side of the period, then the access adds the member variable's offset to the aggregate's address.
2. If there is a type on the left-hand side of the period, then the expression is illegal outside of sizeof() expressions. Inside sizeof() expressions, the access yields a dummy value whose type matches that of the variable.

In cases 2, 3, and 4, the compiler discards the value (if any) on the left-hand side of the period and prints a warning if the value has a side effects.

struct S
{
        int x
        shared int g
}
S foo() return S{123}
int a:=foo().g //warning: the call to foo() might have side effects that won't occur
int b:=typeof(foo()).g //ok
int c:=(foo(),S.g) //ok
int d:=sizeof(S.x+1) //ok, S.x produces a dummy variable whose type is int

If the expression on the left-hand side of the period is an array, then address or count must appear after the period. .count yields the number of elements in the array or array type. .address yields the address of the first element of the array and doesn't make sense for array types. .count and .address are useful for writing generic code. [Future versions of Orth will replace .count and .address with a conversion from an array to a user-defined "range" type].

Unreachable

unreachable

The unreachable keyword instructs the compiler to insert an INT3 interrupt into the executable for debugging. Like a typeless integer, unreachable has no type. In practice, you will use unreachable in two places. The first is on a separate line after an if-statement. The second is after the final colon in a chain of conditional expressions.

assert(x==y) //perform this test only in debug builds
if(x!=y) //perform this test in debug and release builds
    unreachable
foo( value<0 ? a : value>0 ? b : unreachable) //assert that value is nonzero in debug and
                                              //release builds

Sizeof, Alignof, and Typeof

sizeof(value) → typeless_integer
sizeof(Type) → typeless_integer
alignof(value) → typeless_integer
alignof(Type) → typeless_integer
typeof(value) → Type
typeof(Type) → Type

Unlike C/C++, sizeof() and alignof() yield a typeless integer. The compiler discards the expression inside the parens. The expression is allowed to have side effects and to access member variables without an object. The expression inside the parens cannot be a typeless integer or typeless expression. The expression is not allowed to declare a variable or function.

struct Foo
        int x
const int c:=sizeof(Foo.x)
const int d:=sizeof(123) //error: expression inside parens is a typeless integer
const int e:=sizeof(int i) //error: expression inside parens declares i

Bitcast

bitcast(typeless_integer, Type) → value
bitcast(typeless_expression, Type) → value
bitcast(value, Type) → value

bitcast() simply reinterprets the bits of the first argument as the type specified by the second argument. The type must be nonempty. If the first argument is a typeless integer or expression, then the type's size must be 1, 2, 4, or 8. The compiler first converts the typeless integer or expression to the appropriate unsigned type. For instance, bitcast(123,single) is equivalent to bitcast(uint(123),single) because the size of uint matches the size of single. If the first argument is a value, then the value's size must match the size of the second argument.

Unlike the C/C++ reinterpret_cast(), bitcast() works with an r-value instead of an address. For instance, reinterpret_cast<double>(l+1) causes a compiler error in C++, but compiles correctly in Orth if we substitute bitcast() for reinterpret_cast().

Address-of Operator

&value → pointer

Like C/C++, the prefix ampersand (&) operator yields the address of an l-value. When used with an r-value, the compiler generates an error. Unlike C/C++, an ampersand operator combined with a carot operator cancel each other out yielding the innermost expression. It is therefore possible (but not very useful) to take the address of a dereferenced expression twice, like so:

int^ p
int^^ q=&(&(p^))

Unlike C/C++, Orth assignment operators currently yield r-values to simplify the compiler back-end.

The expressions listed below yield l-values:

all variables (but not functions because they are already addresses)
comma operator (if the second operand is an l-value)
conditional operator (if the second and third operands are l-values)
bitcast() operator (if the first operand is an l-value)
pointer operator (^)

Pointer Operator

Type^ → Type
value^ → value

The pointer operator forms a pointer type or dereferences a pointer value. For pointer values, it undoes the address-of operator and vice versa. Dereferencing a pointer to an empty type is a legal NOP.

Function Operator

Type cdecl(Type optionalSymbol, Type optionalSymbol, ...) → FunctionType
Type stdcall(Type optionalSymbol, Type optionalSymbol, ...) → FunctionType

The result and each parameter must be a type. The compiler ignores the symbol (if any) after each parameter type. The operator creates an external function type that an Orth program needs to interface with other programming languages.

Calls

Type(nontype) → value
function(nontype, nontype, ...) → value

An Orth call uses the same convention as C++. If the expression on the left-hand side of the paren is a type, then the compiler evaluates the expression inside the parens and converts it to that type. Orth doesn't support C-style casts (e.g., (int)x) because they are hard to parse. If the expression on the left-hand side of the paren is a function, then the compiler converts each argument to the corresponding parameter type of that function. The number of arguments must match the number of parameters.

Subscripts

Type[word] → Type
array[word] → value
pointer[word] → value

The compiler first converts the expression inside the brackets to a word. If the first expression is a type, then the second expression must be a compile-time constant between 0 and 0x100000. The result is an array type with the specified element type and element count.

If the first expression is a value, then the result is the array element indexed by the second expression. The array's element size must be nonzero. The compiler prints a warning if the index is a compile-time constant that is out of range (i.e., negative or greater than or equal to the array's element count). If the first expression is a pointer value, then the subscript is equivalent to (pointer+index*sizeof(pointer^))^. The compiler multiplies the index by the size of the pointer's base type, adds it to the pointer, and dereferences the result. The pointer's base type cannot be empty.

Compound Literals

Type{nontype, nontype, ...} → value

Orth compound literals are the same as C compound literals except that the type isn't parenthesized. Type{expression} is equivalent to (Type anon:={expression}) except that the former is an r-value whereas the latter is an l-value.

Comma Operator

nontype , typeless_integer → typeless integer
nontype , typeless_expression → typeless_expression
nontype , value → value

The comma operator (also known as the side-effect operator) evaluates the first operand, discards it, and evaluates the second operand. If the right operand is a typeless integer or typeless expression, then the compiler is unable to infer the right operands's type from the left operand. Instead it creates a placeholder, called a typeless expression, and generates instructions for the placeholder once it's able to infer the type from the comma expression's context. If the operand has no side effects, the compiler prints a warning. A side effect is any instruction that modifies memory (assignment and increment) or calls a subroutine.

Conditional Operator

true ? type1 : type2 → type1
false ? type1 : type2 → type2
true ? typeless_integer1 : typeless_integer2 → typeless_integer1
false ? typeless_integer1 : typeless_integer2 → typeless_integer2
bool ? typeless : typeless → typeless_expression
bool ? value : value → value

The form involving two types requires a constant boolean. If the first operand is a constant, the compiler evaluates it at compile time and replaces the conditional with the second operand (if the condition is true) or the third operand (if the condition is false).

If the second and third operands are typeless integers or expressions, the compiler is unable to infer their type. Instead, it creates a placeholder, called a typeless expression, and generates instructions for the placeholder once it's able to infer the type from the conditional expression's context. Otherwise, the compiler converts both operands to a common type and evalutes the conditional at runtime. The result is an l-value iff both operands are l-values.

Not

! bool → bool

The compiler converts the operand to a boolean and inverts it by xor'ing the value with 1. In practice, this means that all booleans must have the value 0 or 1 to function properly, which might become an issue when exchanging booleans with third-party libraries.

Negate

- typeless_integer → typeless_integer
- typeless_expression → typeless_expression
- float → float
- integer → integer

If the operand is a typeless integer, then the result is another typeless integer, and an overflow occurs only if the operand is -0x1_0000_0000_0000_0000. If the operand is a typeless expression, then the result is another typeless expression. The compiler prints a warning if negating a constant signed integer causes an overflow. Since negating an unsigned integer produces an unsigned integer, overflow isn't possible for unsigned integers. The semantics of negation are the same as subtracting the value from zero.

Complement

~ typeless_integer → typeless_integer
~ typeless_expression → typeless_expression
~ bool → bool
~ integer → integer

If the operand is a typeless integer, the compiler treats it as a 65-bit two's complement value. The result is negative if the operand is nonnegative and vice versa. An overflow is not possible. The ~ and ! operators are identical for boolean operands.

Increment

++ float → float
-- float → float
float ++ → float
float -- → float
++ integer → integer
-- integer → integer
integer ++ → integer
integer -- → integer
++ pointer → pointer
-- pointer → pointer
pointer ++ → pointer
pointer -- → pointer

Preincrement, predecrement, postincrement, and postdecrement (collectively called increments) require an l-value. Incrementing a floating-point value adds 1.0 to it and vice versa for decrement. Incrementing a pointer adds the size of the pointer's base type to its value and vice versa for decrement. An error occurs if the pointer's base type is empty.

++expr is equivalent to expr+=1 and --expr is equivalent to expr-=1. As such, preincrement and predecrement produce r-values.

Comparison

Type == Type → true or false
Type != Type → true or false
function == function → bool
function != function → bool
bool == bool → bool
bool != bool → bool
typeless_integer op typeless_integer → true or false
typeless_expression op typeless_expression → bool
integer op integer → bool
float op float → bool
pointer op pointer → bool

op is one of ==, !=, <, <=, >, or >=.

Comparing two types with the == operator yields true if the types exactly match and false if they do not. The != operator yields the opposite of the == operator. Other comparisons (<, <=, >, and >=) are illegal for types.

To compare two typeless integers, the compiler treats them as 65-bit signed integers. Hence, a complemented nonnegative integer is automatically less than an uncomplemented nonnegative value (e.g., ~2 < 1). To compare a typeless integer as an unsigned integer use a conversion to an unsigned type (e.g., ~uint(2) > uint(1)). The compiler is unable to infer the type of a comparison involving two typeless expressions, so it simply converts each operand to int.

Pointers use the same comparisons as unsigned integers with null being less than every nonnull value. Functions are equal if their addresses match. Booleans are equal if their bits exactly match. Orth defines 0 as false, 1 as true, and all other values as indeterminate. In practice, this means that a third-party library that represents true as 0xFF won't be able to communicate with an Orth program.

Logical

bool && bool → bool
bool || bool → bool

Logical-and and logical-or (collectively called the logical operators) are identical to their C/C++ counterparts. The compiler converts both operands to booleans and "short-circuits" the execution of the second operand if the first operand is false (in the case of &&) or true (in the case of ||).

Addition

typeless_integer + typeless_integer → typeless_integer
typeless + typeless → typeless_expression
pointer + word → pointer
number + number → number

If both operands are typeless integers, the result is a typeless integer that must fit in the 65-bit signed range (i.e., [-0x1_0000_0000_0000_0000,0xffff_ffff_ffff_ffff]). Exceeding this range causes a compile-time error. Otherwise, if both operands are typeless integers or expressions, the result is a typeless expression. If the left operand is a pointer, the right operand is converted to a word, multiplied by the size of the pointer's base type, and added to the left operand. The base type must not be empty. If the left operand is a number, the compiler picks a common type for the two operands and adds them. The compiler attempts to trap overflow at compile time. Overflow occurs if the sum exceeds the N-bit signed range (if the common type is a signed N-bit integer) or the N+1-bit signed range (if the common type is an N-bit unsigned integer), or the double-precision floating-point range (if the common type is double).

Subtraction

typeless_integer - typeless_integer → typeless_integer
typeless - typeless → typeless_expression
number - number → number
pointer - word → pointer
pointer - pointer → word

Subtraction is analogous to addition except for the additional form, which converts the two pointer operands to a common type, subtracts them, and divides the difference by the size of the pointer's base type. The base type cannot be empty. The result is a word.

Multiplication

typeless_integer * typeless_integer → typeless_integer
typeless * typeless → typeless_expression
number * number → number

Multiplication is analogous to addition.

Division

typeless_integer / typeless_integer → typeless_integer
typeless / typeless → typeless_expression
number / number → number

Integer division uses the same rounding rules as C/C++. Division by zero generates a compile-time error if possible (i.e., the right operand is an typeless integer or a constant).

Remainder

typeless_integer % typeless_integer → typeless_integer
typeless % typeless → typeless_expression
integer % integer → integer

Integer division uses the same rounding rules as C/C++. Division by zero generates a compile-time error if possible (i.e., the right operand is an typeless integer or a constant). The remainder is not available for floating point.

Bitwise XOR

typeless_integer @ typeless_integer → typeless_integer
typeless @ typeless → typeless_expression
integer @ integer → integer
bool @ bool → bool

Bitwise operations cannot overflow for operands in any form (typeless or typed). If both operands are typeless integers, the compiler treats each operand as a 65-bit two's complement value. The result is negative if either operand (but not both operands) are negative. Otherwise, if both operands are typeless integers or expressions, the result is a typeless expression whose type depends on the surrounding context.

If the left operand is a boolean, the right operand is converted to a boolean, and the result is true if either operand (but not both operands) is true. Since the compiler uses an 8-bit XOR instruction, the operands must be in "canonical" form to work properly (i.e., 0x00 for false and 0x01 for true). Otherwise, the compiler converts both operands to a common type.

Bitwise XOR uses @ instead of ^ to avoid a syntactic ambiguity with the pointer operator.

Bitwise AND

typeless_integer & typeless_integer → typeless_integer
typeless & typeless → typeless_expression
integer & integer → integer
bool & bool → bool

Bitwise AND is analogous to bitwise XOR. The resulting typeless integer is negative if and only if both typeless-integer operands are negative. If the left operand is a boolean that evaluates to false, the right operand is still evaluated.

Bitwise OR

typeless_integer | typeless_integer → typeless_integer
typeless | typeless → typeless_expression
integer | integer → integer
bool | bool → bool

Bitwise OR is analogous to bitwise XOR. The resulting typeless integer is negative if and only if either typeless-integer operand is negative. If the left operand is a boolean that evaluates to true, the right operand is still evaluated.

Shift Left

typeless_integer << typeless_integer → typeless_integer
typeless_integer << constant_integer → typeless_integer
typeless << int¹ → typeless_expression
integer << int¹ → integer

¹signed 32-bit integer

If the first operand is a typeless integer and the second operand is a typeless integer or a constant int, the result is the left operand multiplied by two raised to the right operand. The right operand must be nonnegative. If the result doesn't fit within the signed 65-bit range, then compiler prints an error. Shifting zero by an arbitrarily large positive value always produces zero.

If the first operand is a typeless integer or expression and the compiler is unable to fold the shift (as above) then the result is a typeless expression whose type depends on the surrounding context.

Unlike the other arithmetic operators, shifting doesn't require the operands to have the same type. The compiler promotes the left operand to an integer (i.e., 1- and 2-byte integer types become int; the others are unchanged) and converts the right operand to an int.

Shifting by a negative number of bits causes a compiler error (if possible) or produces zero at runtime (because the code generator uses an unsigned shift amount). A shifted N-bit constant must fit within the N-bit unsigned range. That is, the shift can cause a negative value to become positive or vice versa, but the value must not lose any significant bits (i.e., you can undo the left shift with a matching right shift):

const int FIFTEEN:=15
const int foo:=FIFTEEN<<28 //ok, result is negative but no information was lost
const int bar:=FIFTEEN<<29 //error, top bit was lost

Shifting a value left by N bits yields the same result as shifting it left by 1 bit N times. In order words, left shifting an N-bit variable by N or more bits makes it zero. The compiler prints a warning if the right operand is a constant that exceeds the number of bits in the left operand.

Shift Right

typeless_integer >> typeless_integer → typeless_integer
typeless_integer >> constant_integer → typeless_integer
typeless >> int¹ → typeless_expression
integer >> int¹ → integer

¹signed 32-bit integer

If the first operand is a typeless integer and the second operand is a typeless integer or a constant int, the result is the left operand divided by two raised to the right operand rounded down. The right operand must be nonnegative. Right shifting a typeless or typed integer cannot overflow.

Unlike the other arithmetic operators, shifting doesn't require the operands to have the same type. The compiler promotes the left operand to an integer (i.e., 1- and 2-byte integer types become 'int'; the others are unchanged) and converts the right operand to an int.

Shifting by a negative number of bits causes a compiler error (if possible) or evaluates to zero at runtime (because the code generator uses an unsigned shift amount).

Shifting an N-bit signed value to the right by N or more bits copies the sign bit to every other bit. That is, the value becomes zero if it was positive and -1 if it was negative. Shifting an N-bit unsigned value to the right by N or more bits makes it zero. The compiler prints a warning if the right operand is a constant that exceeds the number of bits in the left operand.

For typed integers, the shift is arithmetic if and only if the left operand is signed.

Assignment

value := nontype → value

The compiler converts the right operand to the left operand's type. The left operand must be an l-value. If the left operand is an empty type (e.g., void), then the assignment is a NOP yielding a trivial instance of the empty type. Otherwise, the assignment copies the bits from the right operand into the left operand and yields the left operand as an r-value.

Arithmetic Assignment

pointer += word → pointer
number += number → number
pointer -= word → pointer
number -= number → number
number *= number → number
number /= number → number
integer %= integer → integer
bool @= bool → bool
integer @= integer → integer
bool &= bool → bool
integer &= integer → integer
bool |= bool → bool
integer |= integer → integer
integer <<= int¹ → integer
integer >>= int¹ → integer

¹signed 32-bit integer

The left operand must be an l-value. For division and remainder, the compiler converts both operands to a common type. For left shift and right shift, the compiler converts the right operand to a 32-bit signed integer. For the remaining operators, the compiler converts the right operand to the left operand's type (if the left operand is a number) or to a word (if the left operand is a pointer).

The semantics of each operator is the same as the corresponding arithmetic operator. The compiler attempts to trap invalid operations at compile time (division by zero, shifting by a negative number of bits, and shifting by too many bits). Unlike C/C++, each operator yields the left operand as an r-value.

Statements

Each statement must appear inside a function's body or the body of another statement. Aggregates and the global context cannot contain statements.

Scope

scope body

The scope statement is identical to an if(true) statement.

If

if( condition ) body
if( condition ) body else body

The semantics are the same as C/C++. The compiler converts the condition to a boolean and executes the first body if the condition is true or the optional second body if the condition is false. The compiler optimizes away the if statement if the condition is a constant, making the if statement a suitable replacement for the C/C++ #ifdef/#endif directive.

The compiler creates up to three scopes for an if statement: one for each body and one for the whole statement. Each body can access declarations inside the condition but statements above and below the if statement cannot. Declarations in one body are invisible to the other body.

int k:=i //error

if( (int i:=foo()) > (int j:=bar()) )
        return i
else
        return j

int k:=j //error

Select

select( selector ) body

The statement's body must contain only case statements and up to one else statement:

case( value1 , value2 , ... ) body
else body

The select statement is a specialized version of the C/C++ switch statment. The purpose is to select one matching case or the default case by comparing the selector with the value (or values) of each case. Although you can achieve the same semantics with an if, else if, else if, ..., else chain, a select statement is more concise and usually more efficient. The compiler optimizes away the select statement if the condition is a constant, making select a suitable replacement for the C/C++ #if...#elif...#endif directive.

The selector must be a number (or, in the future, must be convertible to a number), which undergoes a unary promotion. Unlike C/C++, the selector can be a floating-point value. Each case value must be a constant. The compiler converts each case value to the selector's type. Duplicate values are an error. Missing values are currently not an error even if the else clause is not present (this will probably change for enumerated types). The else clause need not be reachable (i.e., all values are handled).

The scope rules are the same as the if statment. Each case can access declarations in the selector but statements above and below the select statment cannot. Declarations inside one case (its values and body) are invisible to other cases.

The order of the cases and else statement doesn't matter. Multiple else statements are an error. Statements besides case and else are illegal.

A case value can be a closed or half-open range. A closed range uses two periods (e.g., 1..2 means [1,2]). A half-open range uses two periods and a less-than sign (e.g., 1..<3 means [1,3)).

A range of values is semantically equivalent to a comma-separated list containing every value in the range. In particular, ranges cannot overlap. The range is unsigned if and only if the selector is unsigned. For example, the range 0xffff_ffff..0 contains two values, -1 and 0, if the selector's type is int and is invalid if the selector's type is uint. Similarly, the range -2..-1 contains two values, 0xffff_fffe and 0xffff_ffff, if the selector's type is uint and is invalid if the selector's type is int. A floating-point range that includes 0.0 automatically includes +0.0 and -0.0. Handling +0.0 in one case and -0.0 in a different case is impossible. A range that includes a NAN or INF follows the IEEE754 ordering (i.e, -NAN < -INF < -X < 0 < +X < +INF < +NAN ).

Implementation Note: The compiler implements the select as a binary search and/or one or more lookup tables.

While

while( condition ) body

The semantics are the same as C/C++. The compiler converts the condition to a boolean and executes the body so long as the condition is true. The body can access declarations in the condition, but statements above and below the while statement cannot. Unlike an if statement, the condition is able to access declarations in the body.

while((int x:=foo())>0)
        bar(x) //ok

bool first:=true
while(first || x==0) //ok, we initialize x before testing its value (the compiler might
{                    //nonetheless issue a warning)
        first:=false
        int x:=foo()
}

while(foo(&x))   //ok, we can evaluate the address of a local variable before
        int x:=bar() //initializing it

Do

do body while( condition )

The semantics of the do statement are the same as the while statement except that the compiler evaluates the condition after executing the body. The condition can access declarations in the body, but statements above and below the do statement cannot.

do
        int x:=foo()
while(x>0) //ok

bool first:=true
do
{
        foo(first || x==0) //ok, we initialize x before testing its value (the compiler might
        first:=false       //nonetheless issue a warning)
}
while((int x:=foo)>0)

For

for( initializer ; condition ; increment ) body

The initializer, condition, and increment are optional. A missing condition is equivalent to true. Otherwise, the compiler converts the condition to a boolean. The increment, if present, must have a side effect. A side effect is any instruction that modifies memory (assignment and increment) or calls a subroutine.

Declarations in each part of the statement (initializer, condition, increment, and body) are visible to every other part of the statement and invisible to statements outside of the for statement. Semantically, the for statement is equivalent to:

scope
{
        initializer
        top::
        if(!condition) //If the condition is true, the condition's finalizers execute after the
                goto bottom   //body and increment execute
        body //The body's finalizers execute after the increment executes
        increment
        goto top
        bottom::
}

Label

label ::
label :: statement

A label refers to the following statement in the same scope (if any) that isn't an attribute. Placing the label above a statement or in front of it makes no difference. A label is the target of a break or continue (if the label precedes a loop statement) or a goto. All labels in a particular function must have a unique name and must not conflict with other declarations in the function (see Scopes).

void foo()
{
        scope
        {
                label:: //label refers to the call to foo()
                int f:=foo()
                label2:: //label2 doesn't refer to any statements because bar() belongs to the
        }                //enclosing scope
        bar()
        scope
                label:: //error, foo already contains a declaration for label
        goto label //ok, label is visible even though f isn't
}

Goto

goto label

The label must be visible (i.e., must be in the same function as the goto). In future versions of Orth, a goto will not be able to skip the initialization of objects that have finalizers (otherwise, the compiler would have to remember whether to call the destructor).

Break and Continue

break
continue
break label
continue label

The specified label must precede an enclosing for, do, or while statement. If the label isn't present, the statement refers to the nearest for, do, or while statement.

Return

return
return result

The first form is valid only if the result type is void. The second form is valid for all result types (including void). Like C++, returning an instance of void is a valid NOP.

signed	`byte`, `short`, `int`, `long`, or `dchar¹`
unsigned	`ubyte`, `ushort`, `uint`, `ulong`, `char`, or `wchar`
integer	signed or unsigned
float	`single` or `double`
number	integer or float
fundamental	`void`, `typeof(null)`, `typeof(uninit)`, `bool`, or number
pointer	type`^`
function	type `cdecl(`...`)` or type `stdcall(`...`)`
primitive	fundamental, pointer, or function
array	type`[`count`]`
aggregate	`struct`
composite	array or aggregate
comparable	primitive whose size is nonzero
enumerable	number or pointer

Contents