Unlike C/C++, the order of evaluation is well-defined in debug and release builds. Side effects always occur in the order that they appear in the file.
Orth does not have C++ temporaries. A temporary is an anonymous user-defined type (UDT) that the C++ compiler automatically creates in the middle of an expression. Unlike an r-value, a temporary has an address and it might have destructor that must execute at a precise time and in a precise order.
The compiler organizes the program's code and data into four sections. It puts the code for each function in the TEXT section, constants in the RDATA section, shared variables with initializers in the DATA section, and shared variables without initializers in the BSS section. The RDATA section is readonly, meaning that any attempt to modify a constant will generate an access violation. The BSS section doesn't occupy an space in the executable. The Windows loader initializes each byte of each variable in the BSS section to zero.
signed | byte , short , int , long , or dchar1
|
unsigned | ubyte , ushort , uint , ulong , char , or wchar
|
integer | signed or unsigned |
float | single or double
|
number | integer or float |
fundamental | void , typeof(null) , typeof(uninit) , bool , or number
|
pointer | type^
|
function | type cdecl( ...) or type stdcall( ...)
|
primitive | fundamental, pointer, or function |
array | type[ count]
|
aggregate | struct
|
composite | array or aggregate |
comparable | primitive whose size is nonzero |
enumerable | number or pointer |
1dchar
is signed because the maximum Unicode code point is well below 4 billion and signed values
are less prone to error than unsigned values.
Type | Size | Alignment |
---|---|---|
void | 0 | 1 |
typeof(null) | 0 | 1 |
typeof(uninit) | 0 | 1 |
bool | 1 | 1 |
char | 1 | 1 |
wchar | 2 | 2 |
dchar | 4 | 4 |
byte | 1 | 1 |
ubyte | 1 | 1 |
short | 2 | 2 |
ushort | 2 | 2 |
int | 4 | 4 |
uint | 4 | 4 |
long | 8 | 8 |
ulong | 8 | 8 |
single | 8 | 8 |
double | 8 | 8 |
pointer | word2 | word2 |
function | word2 | word2 |
2Throughout this document, I use word to mean int
on 32-bit machines and long
on 64-bit machines.
An empty type, such as void
, int[0]
, or struct{}
, always has a valid nonnull address. Empty shared
and local variables have a unique address. Empty member variables typically share the same address as
another member. Copying an empty type is a NOP. For instance, the following code is legal but produces
no assembly instructions. Returing an empty type leaves the function's result (EAX on x86) uninitialized.
void a void b:=a a:=b |
The main purpose of void
is to declare a function that doesn't return a value. void
is also useful as a
placeholder for irrelevant or meaningless data. For instance, if we had a map template with two parameters,
First
and Second
, we could define a set of integers by using int
for First
and void
for Second
.
The null
keyword replaces NULL
in C and 0
in C++. null
is a constant whose type is typeof(null)
.
Since typeof(null)
is awkward to type, you'll probably want to give it a shorter name like Null
using a typedef
.
null
converts to any pointer or function type. The conversion yields a pointer with all bits cleared.
null
is also a convenient way of clearing user-defined types. For instance, we could define a
structure whose constructor took a single typeof(null)
argument. We could then clear the structure
using the same syntax as that for clearing a pointer.
uninit
is a constant value whose type is typeof(uninit)
. Since typeof(uninit)
is awkward to type, you'll probably want to give it a
short name like Uninit
using a typedef
.
uninit
converts to any type. The compiler initializes small variables (like int
) with the bit pattern 0y110011001100...
in debug builds and leaves large variables (like arrays) uninitialized. The keyword is useful for circumventing a compiler warning about an uninitialized
variable. For example:
int x:=uninit if(foo()) x:=bar() ... if(foo()) baz(x) //Without the uninit initializer, the compiler might complain that x is //uninitialized here |
Future versions of Orth will require an initializer for each member of a structure. uninit
is a convenient way of emphasizing that
an array initially contains garbage:
struct X { int a:=0 int[100] b //warning: b has no initializer int[100] c:=uninit //ok, c initially contains garbage } |
Finally, a user-defined dynamic array might want to provide an optimized insertion function that leaves the new elements uninitialized:
struct ArrayOfInts { void appendn(int n, int value) {...} //append n copies of value void appendn(int n, typeof(uninit)) {...} //reserve space for n integers but don't //initialize them } |
Orth defines three character types: char
, wchar
, and dchar
.
A char
represents a single ASCII character in the range [0x00,0x7F]3 or a single byte of a UTF8 string.
A wchar
represents a Unicode character in the range [0x0000,0xFFFF] or a 16-bit unit of a UTF16 string.
A dchar
represents any Unicode character.
A character literal is a constant dchar
. Converting a constant dchar
to a char
generates a warning if the
value is outside the range [0x00,0xFF], and converting a constant dchar
to a wchar
generates a warning if the
value is outside the range [0x00,0xFFFF]. char
and wchar
are unsigned whereas dchar
is signed.
3A char
can also represent an extended character (0x80 through 0xFF) in the current locale's code page.
(typically Windows 1252 for English speakers). Since I hate working with locales, I always use Unicode
for non-ASCII characters.
single
is a single-precision floating-point value that is equivalent to float
in C/C++.
double
is a double-precision floating-point value. Throughout this document, float refers
to any floating-point type (single precision or double precision).
A pointer type uses the syntax type^
. The type in front of the carot is called the base type.
Orth uses a carot in place of an asterisk to make the syntax easier to parse.
The base type can be any type including void
, typeof(null)
, and another pointer type.
A function type uses the following syntax:
result cdecl(
param0, param1, ... )
result stdcall(
param0, param1, ... )
result internal(
param0, param1, ... )
The first two types are external function types. The third type is an internal function type.
Orth uses an external calling convention for calls to functions in shared libraries (cdecl
or stdcall
)
and an internal calling convention for calls to Orth functions (simply internal
). The specifics of the
internal
convention (argument order, argument size, caller/callee stack cleanup, etc.) are nonstandard and subject to change.
On x64, cdecl
and stdcall
are equivalent because x64 has a single calling convention. Orth does not support the fastcall
and thiscall
convention.
Unlike C/C++, a function is a pointer so an asterisk (C/C++ syntax) or carot (Orth syntax) is unnecessary. Similarly, taking the address of a function is redundant (and illegal in Orth) because accessing a function gives you a pointer r-value.
Similar to C/C++, each parameter can be either a type or a variable declaration. In the latter case, the compiler ignores the variable's symbol.
Of the three types above, only the first two are valid Orth code. The third is an internal compiler type that you'll see only in conversion errors and such.
int cdecl(int) f //ok int cdecl(int) f //ok int internal(int) f //error: invalid type |
By default, each function declaration uses the internal
calling convention. You can change a function's calling convention using the
cdecl
and stdcall
attributes. Exporting a function or declaring a function import automatically uses the cdecl
convention.
int foo(int x) //foo uses internal calling convention return x+1 cdecl int bar(int x) //bar uses cdecl calling convention return x+1 int cdecl(int y) f:=foo //invalid: cannot convert 'int internal(int)' to 'int cdecl(int)' f:=bar //ok, f is now an alias for bar //the function type's parameter names need not match bar's parameter names |
An array type uses the syntax type[
count]
. The type in front of the bracket is called the element type.
An array can have zero elements, but be careful with the struct hack. The struct hack is where you put a zero-sized array at the end
of a structure as a placeholder for an array of unknown size. Since Orth automatically rearranges structure members, either
refactor the code or use cdecl struct
to force the compile the layout the structure using the C/C++ ABI.
An array can have a maximum of 1,048,576 (220) elements mainly just as a sanity check. If you try to create an array with 0x7FFF_FFFF elements, then your code probably contains a mistake.
The scopes in an Orth program form a tree. There is a single root: the global scope. Each source file contributes declarations to the same global scope. Each aggregate declaration creates one subscope and each function declaration creates two nested subscopes: one for the parameters and one for the body. For example, the program below has ten scopes:
struct S { int a struct T shared int b int foo(int c) { int d:=c+1 return d } } int bar(int e) { label1:: if((int f:=e*2)<10) { label2:: int g:=f+1 return f } else { int g:=f-1 return f } } int h |
These ten scopes form the following tree: (Parenthesized symbols indicate the scope of each declaration.)
S
, bar
, h
)
S
(a
, T
, foo
)
T
(b
)
foo
(c
)
foo
(d
)
bar
(e
)
bar
(label1
and label2
)
f
)
g
)
g
)
Even though b
is shared
, it sill belongs to the S.T
scope. e
, f
, and g
have the same lifetime (they are each
part of bar
's stack frame) but they belong to different scopes. A variable's scope determines visibility and not storage.
A label always belongs to the body of the innermost function that contains it.
By default, shadowing causes a warning. A declaration shadows another declaration if
(1) the declarations declare the same symbol; and (2) the shadowed declaration is visible from
the shadowing declaration's scope (i.e., looking up the symbol in the shadowing declaration's scope
succeeds). Shadowing causes subtle errors because declaring a new symbol can
silently change the meaning of existing code. Ocassionally, shadowing is useful, so Orth
lets you ignore the shadowing warning using the shadow
attribute. The shadow
attribute
allows a declaration to shadow a declaration in a different scope. A declaration can
never shadow a declaration in the same scope. The compiler doesn't allow function parameters to shadow.
void foo(int i) { int i //warning: i shadows a parameter shadow int i //ok if((int i:=bar())>0) //warning: i shadows a local variable { int i //warning: i shadows a local variable int i //error: this scope already contains i shadow int i //error: this scope already contains i int j } else int j //ok, first j is invisible from this scope } |
Symbol lookup is simple in Orth 0.3. The compiler searches for the first enclosing scope that declares a symbol beginning in the scope of the lookup and ending in the global scope.
You can make a declaration anonymous by using anon
in place of the declaration's symbol (or omitting
the symbol of a struct
with braces). An anonymous declaration
will not conflict with any other anonymous declarations. Accessing an anonymous declaration is impossible.
anon
is useful for declaring RAII objects whose only purpose is to initialize something in their constructor
and finalize something in their destructor. anon
is useless for a typedef, so the compiler doesn't allow it.
The shared
keyword replaces the C/C++ static
keyword. Every thread in the process shares the value of a shared variable.
Every declaration in the global scope and every constant is automatically shared
. [In Orth 0.3, every function is
automatically shared also.]
Unlike C++, shared
variables cannot have constructors or destructors. Additionally, their initializers must be constant.
This limitation simplifies the language enormously while retaining most of the benefits of object-oriented programming.
For example, Orth avoids the C++ initialization-order problem where one global variable accesses a global variable in a different
module before it's initialized. Orth also makes it easier to walk the heap and report memory leaks. Since Orth doesn't have
global finalizers that release heap memory, a program doesn't need to force each global finalizer to execute prior to running a leak test.
The const
keyword indicates a value that is known at compile time or link time. For instance, the size of a pointer
is known at compile time and the address of a shared variable is known at link time. Each constant must have an initializer.
Unlike C/C++, a constant is automatically shared, so writing shared const
is redundant. Orth doesn't support "readonly"
variables. In other words, you can't instantiate a local variable, initialize it, and make it readonly for the remainder of its lifetime.
Accessing a constant from any point in the code automatically evaluates and substitutes the constant's value. The constant's value may, in turn,
depend on other constants whose value is then substituted into the expression. The process continues until either (1) the compiler is able to determine the value of
all constants in the chain; or (2) the compiler detects a dependency cycle. In the latter case, the compiler issues an error
and "gives up" on every constant and expression involved in the dependency cycle. The example below contains four constants with
an illegal dependency cycle involving c
, e
, and f
.
int foo() return c //to evaluate c, the compiler follows the steps below const int c:=d+e const int f:=c const ind d:=1 const ind e:=f |
c
is a constant, so the compiler substitutes its value, d+e
d
is a constant whose value is 1
, so the expression becomes 1+e
e
is a constant whose value is f
, so the expression becomes 1+f
f
is a constant whose value is c
, so the expression becomes 1+c
Since the compiler is already trying to determine the value of c
in step 1, it issues an error and gives up on c
, d
, e
, and f
.
It will subsequently ignore any expressions that make use of these constants.
Orth imports let you call functions and access data in DLLs. Calling an imported function (or accessing an imported variable) is a three-step process.
The first step is to write a pragma(lib)
directive containing the path of the shared library you want to import such as pragma(lib,"kernel32.dll")
The scope of the pragma directive doesn't matter. The library path is case insensitive and can use either forward slashes or backslashes. The library path can be an absolute path or a relative path. If the path is relative, then the compiler uses the following search order to locate the library:
pragma(lib_path)
directive. Each lib_path
can be either absolute or relative to input directory
(i.e., the first argument to the compiler).
PATH
environment variable
A library path consists of an optional location (e.g., the part preceding the final forward or backslash) and a required name. If the location isn't present, then
the location defaults to ".". The required name must exactly match
the filename on disk (i.e., the compiler doesn't automatically add the .dll
extension). If the program contains multiple pragma(lib) directives with
the same name, then the location of each must match notwithstanding case and forward/backslash conversions. For instance:
pragma("./kernel32.dll")
pragma("KERNEL32.DLL") //ok, location is "." in both cases
pragma("C:/WINDOWS/SYSTEM32/KERNEL32.DLL") //error: "C:/WINDOWS/SYSTEM32" conflicts with "."
|
The second step is to write a function or variable import as follows:
import
convention type function_name(
param0,
param1,
...)
import
convention type variable_name
The convention is either cdecl
or stdcall
. If you omit the convention, then the convention
defaults to cdecl
. x64 has a single calling convention, so the choice between cdecl
and
stdcall
is arbitrary.
Notice that the function import lacks a body. The compiler distinguishes function imports from variable imports by looking for a parenthesized argument list after the name. A variable import cannot have an initializer. Each function parameter can be either a type or a variable declaration. In the latter case, the compiler ignores the variable's name.
The import declares a symbol in the current scope. For instance, you
could organize all of the KERNEL32
imports in a structure named Kernel32
. Imports
are automatically shared, so you could call an import named foo
using Kernel32.foo
.
An import's name must be an identifier containing
only ASCII characters (_
, $
, a-z
, A-Z
, and 0-9
).
The third step is to call the imported function (or access the imported variable) exactly as you would any other function.
The compiler uses the following algorithm to locate each import.
First, the compiler searches the code for pragma(lib)
directives and makes a
list of available libraries. It ignores scope and merges pragma(lib)
directives that refer to the same library using the library's name (the part after the final front or backslash).
For example, foo.dll
, ../FOO.DLL
, and c:\bar\Foo.Dll
all refer to the same library
because the name of each one (converted to lowercase) is foo.dll
.
If a program contained a pragma(lib)
for any two of these paths, the compiler would print an error because
it wouldn't know which string to insert into the executable's import table (even if each path resolved to the same
file on disk). Next, the compiler opens each available library, extracts the exported
symbols, and makes a table of available symbols. Third, it scans the code
for import
declarations and looks up each symbol in the table.
If there are more than one occurrence in the table, the import is ambiguous
and the compiler issues an error. If there are no occurrences, then the import
is undefined, and the compiler issues an error. Otherwise, the compiler
remembers which library contains the import.
For cdecl
imports, the compiler looks up the name exactly as you typed it.
For stdcall
imports, the compiler adds an underscore to the front of the
symbol and an at sign followed by the total number of parameter bytes to the end (e.g., _foo@4
).
Unfortunately, Windows DLLs such as KERNEL32.DLL
don't decorate their exports (even though
they are stdcall
) so the compiler is unable to match them properly. To
remedy this problem, using the pragma(undecorated)
attribute forces the compiler to look
up a stdcall
import exactly as you typed it.
Assuming that no errors occurred, the compiler inserts an import table into the executable or DLL containing the (possibly decorated) name of each import and the library which contains it. At run time, the Windows loader resolves the address of each import in this table.
You can export any cdecl
or stdcall
function and any shared variable using the
export
attribute. Exports are useful only when you're building a DLL. The
compiler inserts an export table in the DLL containing the decorated name and address
of each exported declaration. For cdecl
functions, the decorated name in the table
matches the declaration's symbol. For stdcall
functions, the decorated name
has a leading underscore and a trailing at sign followed by the number of parameter
bytes (e.g., _foo@4
). Each export must have a unique decorated name (but not necessarily
a unique name). An export's name must be an identifier containing
only ASCII characters (_
, $
, a-z
, A-Z
, and 0-9
).
Orth 0.3 supports only one type of aggregate: a struct
. We refer to the
nonshared variable declarations in an aggregate as member variables.
Unlike shared variables, member variables don't occupy memory in the program's
image. Instead, a member variable is simply an offset from the beginning of
the aggregate. Orth 0.3 doesn't support initializers for member variables. Later versions of
Orth will support constant member initializers. By default, the compiler is free to rearrange
the member variables for the optimal fit. Occassionally, a program needs to transfer
a structure to or from a shared library. For these situations, the cdecl
and
stdcall
attributes force the compiler to lay out the structure using the C/C++
ABI. cdecl
and stdcall
are equivalent. For example:
cdecl struct Foo { byte a int b byte c //cdecl forces the compiler to place c after b } import cdecl void bar(Foo^) |
Since a Foo^
is an argument to an imported function, bar
, the compiler emits
a warning if you forget the cdecl
attribute in front of the struct
.
The compiler uses the following procedure to lay out a structure:
alignas()
attribute. alignas()
has a single argument:
a constant integer (1, 2, 4, or 8) or a type. The compiler uses the specified
integer or the specified type's alignment in place of the default alignment.
The new alignment can be less than or greater than the old alignment.
cdecl
and stdcall
attributes),
then the compiler sorts the members in ascending alignment order. Members with the same alignment
remain in the same relative order.
alignas()
attribute in front of the aggregate keyword. As before, alignas()
has a single
argument: a constant integer (1, 2, 4, or 8) or a type.
alignas(1)
.
currentOffset
to zero.
currentOffset
to a multiple of the member's alignment.
This value becomes the member's offset. Then, the compiler increases currentOffset
by the member's size.
currentOffset
ot a multiple of the structure's alignment. This value becomes the structure's size.
An aggregate can contain only declarations and pragmas.
A typedef
is like a constant in many respects. Accessing a typedef
automatically evaluates
its type, which can recursively depend on other typedef
s, structures, constants, etc.
The compiler issues an error for infinite recursion. The compiler evaluates only the type and not the size or alignment
of that type. For example:
struct S { S a //illegal: size of S depends on size of a, which depends on size of S S^ b //ok: size of S depends on size of b, which is unrelated to the size of S T^ t //evaluate T, which is an S, but don't evaluate the size of S because //we don't know it yet! typedef T:=S } |
A parameter declares a local variable in the function's scope.
Each parameter must declare exactly only variable (e.g., int x,y
would be
ambiguous because y
looks like a second parameter). A parameter without a name is anonymous
(e.g., int
by itself is synonymous with int anon
). Future versions of
Orth will support default initializers and keyword arguments.
Unlike C/C++, the compiler passes arrays by value.
A compound initializer initializes each element of an array or each member of an aggregate.
Orth allows you to reorder the initializers using designators. An array designator is
an constant integer, a constant closed/clopen range of integers, or two periods (..
) followed by a colon.
An aggregate designator is the name of an aggregate member followed by a colon.
struct X { int a; int[10] b} X x:={b:{1..3:1,0:0,..:2},a:1} |
Initializing an element/member or member more than once is illegal. Unlike C/C++, neglecting to initialize
an element/member causes an error. An initializer without a designator initializes the element/member after
the last element/member initialized by the previous initializer (if there is one) or causes an error (if there isn't).
The default initializer (..
) initializes the remaining uninitialized elements of an array (if any).
Side effects occur in the same order as their position in the file. The compiler evaluates the default initializer (if present) exactly once in all cases.
Like C/C++, a pragma()
directive is a practical consideration that is often platform-specific or compiler-specific. There are two forms of pragmas:
statements and attributes. A statement pragma appears on a separate line anywhere in a source file. Its scope isn't
important. The compiler silently merges redundant pragmas and generates a warning for incompatible pragmas (e.g.,
specifying base_address
as 0x10000000 in one file and base_address
as 0x20000000 is another file).
An attribute pragma appears in front of an expression (typically a declaration) and affects the compiler's output for that expression.
Orth 0.3 supports the following pragma()
directives:
pragma(dll)
By default, the compiler creates an application. This directive instructs the compiler to generate a library instead.
pragma(base_address,
integer literal )
The program occupies a contiguous block of virtual memory starting at the image's base address. You can explicitly
set the base address using this directive. The second argument must be an integer literal.
pragma(base_address)
is useful for resolving an address conflict between two libraries. If two libraries
overlap (i.e., they attempt to use the same region of virtual memory), the system must rebase one of them to a different
address. Rebasing slows down program startup.
pragma(lib,
string literal )
The pragma(lib)
directive tells the compiler to search the specified library for imports. The second
argument must be a string literal. See Imports for more information.
pragma(lib_path,
string literal )
The directive must appear in the global scope. By default, the compiler searches
the Windows directory, the Windows system directory, and each directory in the PATH
for libraries. pragma(lib_path)
tells the compiler to search the specified path before the paths above. The path can be relative or absolute. If the former,
then it is relative to the input directory (not the location of the file containing the directive). The compiler merges
paths that match (notwithstanding case and forward/backslash conversions) and puts them in an arbitrary order which doesn't necessarily
match their order in the file.
pragma(undecorated)
declaration
This directive appears in front of an imported stdcall
function. It instructs the compiler to look
up the import exactly as you typed it rather than decorating it with a leading underscore, trailing at sign, and trailing parameter byte count.
See Imports for more information.
There are two forms of attributes. Checked attributes appear in front of an expression (typically a declaration) and affect the meaning of that expression. The compiler prints a warning if the attribute has no effect or if the attribute is repeated. Unchecked attributes appear above a statement (or a group of statements surrounded by braces). The compiler prints a warning if the unchecked attribute is repeated but silently ignores an unchecked attribute that has no effect on any statement beneath it.
const export //two unchecked attributes { const int x:=123 //warning: const is repeated void foo() {} //ok: the compiler silently ignores const alignas(4) void foo() {} //warning: alignas() has no effect const alignas(4) //warning: const is repeated void bar() {} //ok: the compiler silently ignores const and alignas() } |
Orth 0.3 supports the following attributes:
shared
(see Shared Declarations)
const
(see Constant Declarations)
import
(see Imports)
export
(see Exports)
cdecl
(see Imports, Exports, and Aggregates)
stdcall
(see Imports, Exports, and Aggregates)
alignas
(see Aggregates)
shadow
(see Shadowing)
Each expression must appear inside a function's body or the body of a statement. Aggregates and the global context cannot contain expressions.
The compiler categorizes each expression in three different ways:
Unlike C/C++, many operators work on types as well as nontypes. For instance, applying the pointer operator to a type, int
, produces
another type, int^
. Applying the pointer operator to a variable whose type is int^
dereferences it.
If an expression isn't a type, then the compiler might not know its type because it is an integer literal (e.g., 123
) or an expression containing
an integer literal (e.g., x?1:2
). Unlike C/C++, Orth doesn't immediately assign a type to integer literals. Instead, it waits until the literal's
type becomes clear from its context (e.g., the program converts the literal or an expression containing the literal to a particular number type).
If the compiler knows the expression's type, then the expression is a typed value.
A typed value is a compiled sequence of instructions. If the instructions calculate an address and load a value from memory
then the typed value is an l-value. An l-value can appear on the left-hand side of an assignment because the compiler is able to
replace the instruction that loads a value from memory with an instruction that stores a new value in memory. For example,
arr[1].mem
is an l-value consisting of four instructions:
arr
sizeof(arr[0])
mem
c
Supposing we then assign 2
to arr[1].mem
, the fourth instruction becomes a store:
arr
sizeof(arr[0])
mem
2
to address c
If the instructions don't ultimately load a value from memory, then the typed value is an r-value. An r-value cannot appear on the left-hand side of an assignment.
Orth infers the type of an integer literal from its context making C/C++ integer suffixes unncessary.
When the compiler sees an integer literal, it creates a placeholder for it called
a typeless integer. Converting the typeless integer to a specific type (e.g., initializing long l
with 0x1234_1234_1234_1234
) replaces the typeless integer with a typed value. Operations involving
only typeless integers yield another typeless integer (e.g., 1+2
yields 3
) or a constant bool (e.g., 1<2
yields true
).
Operations involving a combination of typeless integers and typed values yield typed values in some cases (e.g., 1+x
) and
typeless expressions in other cases (e.g., 1<<x
and b?2:3
). A typeless expression can contain an arbitrarily large
number of operations (e.g., a?1:b?2:c?3:4
). The type of each integer literal in the typeless expression is unknown until the type of the outermost
typeless expression is known (e.g., converting a?1:b?2:c?3:4
to a uint
converts 1
, 2
, 3
, and 4
to uint
).
There are 16 operations that can produce typeless expressions:
-
typeless
~
typeless
+
typeless
-
typeless
*
typeless
/
typeless
<<
int
>>
int
%
typeless
&
typeless
@
typeless
|
typeless
,
typeless
?
typeless :
typeless
unreachable
For 9 of these operations, then compiler is able to infer the type of the typeless expression from the other operand:
+
T → T
+
typeless → T
-
T → T
-
typeless → T
*
T → T
*
typeless → T
/
T → T
/
typeless → T
%
T → T
%
typeless → T
&
T → T
&
typeless → T
@
T → T
@
typeless → T
|
T → T
|
typeless → T
bool
?
typeless :
T → T
bool
?
T :
typeless → T
Type inference is not possible for shifting (<<
and >>
) and side effects (,
). For example, the type of
1<<(int i:=foo())
and (int i:=foo()),1
are unknown without examining the expression that contains them.
Each typeless integer is able to convert itself to an integer or float type. Each
typeless unary operation (-
and ~
) and binary operation (+
, -
, *
, <<
, >>
,
/
, %
, &
, @
, and |
) is able to convert itself
to any integer type. The comma operator (e.g., (foo(),1)
) is able to convert itself to type T if the second operand
is able to convert itself to type T. The conditional operator (e.g., b?2:3
) is able to convert itself to type T
if the second and third operands are able to convert themselves to type T. The unreachable
assertion is able to
convert itself to any type because a working program will never reach the unreachable
assertion.
If a typeless integer or typeless expression isn't able to convert itself to a type, then it converts itself to int
.
For example, consider the expression single s:=b?1<<x:2
. If we naively infer the type of 1
and 2
to be single
, then
it's unclear how we shift 1.0
by x
bits. Instead, we infer 1
to be int
. Since 1<<x
is then an int
,
2
must also be an int
. Some operators work equally well with floating point, but we nonetheless evaluate them using integer arithmetic for
consistency with the shift operators. For example, we could evaluate single s:=(foo(),1)+2
by converting 1
and 2
to a single
and using floating-point addition, but the presence of an arithmetic operator dictates that we convert 1
and 2
to int
instead.
In rare instances, the compiler is unable to infer the type of a typeless expression from its context. There are 5 such instances:
for
statement
for
statement
In each instance, the compiler converts each typeless integer or expression to int
.
Orth 0.3 supports the following conversions:
The compiler issues an error if the conversion loses significant digits. If the conversion type is an N-bit signed type, then the value must fit within the signed N-bit range (i.e., bits N-1 and above must all be set or cleared). If the conversion type is an N-bit unsigned type, then the value must fit within the signed N+1-bit range (i.e., bits N and above must all be set or cleared).
byte | [-0x80,0x7F] |
ubyte , char | [-0x100,0xFF] |
short | [-0x8000,0x7FFF] |
ushort , wchar | [-0x10000,0xFFFF] |
int | [-0x8000_0000,0x7FFF_FFFF] |
uint | [-0x1_0000_0000,0xFFFF_FFFF] |
long | [-0x8000_0000_0000_0000,0x7FFF_FFFF_FFFF_FFFF] |
ulong | all |
These rules are necessary because the compiler complements a typeless integer by negating it and subtracting one. For example,
~1
is the same as -2
(both have the bit pattern ...111111110
). Since we want to be able to
initialize an N-bit unsigned type with any complemented N-bit value, the compiler must accept any value
in the range [-2N,-1] in addition to any value in the range [0,2N-1].
An overflow is impossible. The rounding mode is round to nearest if the conversion isn't exact.
unreachable
to any type
The conversion generates an INT3
instruction that breaks into the debugger.
single
to double
double
to single
The compiler generates a warning if converting a constant double
to single
would overflow
(i.e., yield +INF or -INF). The rounding mode is round to nearest if the conversion isn't exact.
The rounding mode is round to nearest if the conversion isn't exact.
The compiler generates a warning if converting a constant float to an integer would overflow. The conversion truncates the value towards zero.
The conversion uses zero extension when the original type is unsigned and sign extension when the original type is signed.
The conversion cannot overflow. The compiler reinterprets the bits as a signed or unsigned value.
The conversion truncates significant bits. The compiler prints a warning when a constant conversion overflows. A conversion from an N-bit integer to an M-bit integer causes an overflow if the top N-M bits don't match each other. In addition, a conversion from a signed integer to a smaller signed integer also overflows if the value's sign changes. The following conversions from 16 bits to 8 bits are possible:
ushort(0x0000) → ubyte(0x00) |
short(0x0000) → ubyte(0x00) |
ushort(0x0000) → byte(0x00) |
short(0x0000) → byte(0x00) |
void^
The underlying bits are unaffected.
typeof(null)
to pointer or function
The resulting pointer or function has all bits cleared. See null.
The result is undefined because the program should never access it. In practice, the compiler initializes primitives with the repeating bit pattern 0xCCCC
...
and leaves composites uninitialized (for the sake of efficiency). See uninit.
Some unary operators "promote" their operand according to this table. The promotion rules are the same as C/C++.
double
or single
→ double
ulong
→ ulong
long
→ long
uint
→ uint
int
, dchar
, short
, ushort
, wchar
, byte
, ubyte
, or char
→ int
Some binary operators "promote" their operands according to this table. The promotion rules are the same as C/C++.
double
, the common type is double
.
single
, the common type is single
.
ulong
, the common type is ulong
.
long
, the common type is long
.
uint
, the common type is uint
.
int
.
The conditional operator finds a common type for its second and third operand. Future versions of Orth may feature templates that automatically find a common type for two or more parameters.
.count
.
symbol
.address
.count
.
symbol
.address
.count
.
symbol
In general, an access is a reference to a declaration in the same scope, an enclosing scope, or a specific aggregate's scope. If there is no period, then the compiler looks up the symbol starting from the current scope. Otherwise, the compiler evaluates the expression on the left-hand side of the period and searches for the symbol in the aggregate's scope (if the expression evaluates to an aggregate type or aggregate instance) or evaluates the array's count or address (if the expression evaluates to an array type or array instance).
If the expression on the left-hand side of the period is a pointer (but not a pointer type), then the compiler dereferences it repeatedly until it is no longer a pointer.
An access typically yields the value of a variable or the address of a function. Specifically:
sizeof
() expressions.
Inside sizeof
() expressions, the access yields a dummy value whose type matches that of the variable.
In cases 2, 3, and 4, the compiler discards the value (if any) on the left-hand side of the period and prints a warning if the value has a side effects.
struct S { int x shared int g } S foo() return S{123} int a:=foo().g //warning: the call to foo() might have side effects that won't occur int b:=typeof(foo()).g //ok int c:=(foo(),S.g) //ok int d:=sizeof(S.x+1) //ok, S.x produces a dummy variable whose type is int |
If the expression on the left-hand side of the period is an array, then address
or count
must appear
after the period. .count
yields the number of elements in the array or array type.
.address
yields the address of the first element of the array and doesn't make sense for array types. .count
and .address
are useful for writing generic
code. [Future versions of Orth will replace .count
and .address
with a conversion
from an array to a user-defined "range" type].
unreachable
The unreachable
keyword instructs the compiler to insert an INT3 interrupt into the executable for debugging.
Like a typeless integer, unreachable has no type. In practice, you will use unreachable in two places. The first is on a separate
line after an if-statement. The second is after the final colon in a chain of conditional expressions.
assert(x==y) //perform this test only in debug builds if(x!=y) //perform this test in debug and release builds unreachable foo( value<0 ? a : value>0 ? b : unreachable) //assert that value is nonzero in debug and //release builds |
sizeof(
value)
→ typeless_integer
sizeof(
Type)
→ typeless_integer
alignof(
value)
→ typeless_integer
alignof(
Type)
→ typeless_integer
typeof(
value)
→ Type
typeof(
Type)
→ Type
Unlike C/C++, sizeof()
and alignof()
yield a typeless integer. The compiler discards the expression
inside the parens. The expression is allowed to have side effects and to access member variables without
an object. The expression inside the parens cannot be a typeless integer or typeless expression. The expression is not allowed to declare
a variable or function.
struct Foo
int x
const int c:=sizeof(Foo.x)
const int d:=sizeof(123) //error: expression inside parens is a typeless integer
const int e:=sizeof(int i) //error: expression inside parens declares i
|
bitcast
(typeless_integer, Type) → value
bitcast
(typeless_expression, Type) → value
bitcast
(value, Type) → value
bitcast()
simply reinterprets the bits of the first
argument as the type specified by the second argument. The type must be nonempty. If the first argument
is a typeless integer or expression, then the type's size must be 1, 2, 4, or 8. The compiler first converts
the typeless integer or expression to the appropriate unsigned type. For instance, bitcast(123,single)
is equivalent to
bitcast(uint(123),single)
because the size of uint
matches the size of single
. If the first argument is
a value, then the value's size must match the size of the second argument.
Unlike the C/C++ reinterpret_cast()
, bitcast()
works with an r-value instead of an address. For instance,
reinterpret_cast<double>(l+1)
causes a compiler error in C++, but compiles correctly in Orth if we substitute bitcast()
for reinterpret_cast()
.
&
value → pointer
Like C/C++, the prefix ampersand (&
) operator yields the address of an l-value. When used with an r-value, the compiler generates
an error. Unlike C/C++, an ampersand operator combined with a carot operator cancel each other out yielding the innermost expression.
It is therefore possible (but not very useful) to take the address of a dereferenced expression twice, like so:
int^ p int^^ q=&(&(p^)) |
Unlike C/C++, Orth assignment operators currently yield r-values to simplify the compiler back-end.
The expressions listed below yield l-values:
bitcast()
operator (if the first operand is an l-value)
^
)
The pointer operator forms a pointer type or dereferences a pointer value. For pointer values, it undoes the address-of operator and vice versa. Dereferencing a pointer to an empty type is a legal NOP.
cdecl(
Type optionalSymbol,
Type optionalSymbol,
...)
→ FunctionType
stdcall(
Type optionalSymbol,
Type optionalSymbol,
...)
→ FunctionType
The result and each parameter must be a type. The compiler ignores the symbol (if any) after each parameter type. The operator creates an external function type that an Orth program needs to interface with other programming languages.
(
nontype)
→ value
(
nontype,
nontype,
...)
→ value
An Orth call uses the same convention as C++. If the expression on the left-hand side
of the paren is a type, then the compiler evaluates the expression inside the parens
and converts it to that type. Orth doesn't support C-style casts (e.g., (int)x
) because
they are hard to parse. If the expression on the left-hand side of the paren is a function,
then the compiler converts each argument to the corresponding parameter type of that function. The number of
arguments must match the number of parameters.
[
word]
→ Type
[
word]
→ value
[
word]
→ value
The compiler first converts the expression inside the brackets to a word. If the first expression is a type, then the second expression must be a compile-time constant between 0 and 0x100000. The result is an array type with the specified element type and element count.
If the first expression is a value, then the result is the array element
indexed by the second expression. The array's element size must be nonzero.
The compiler prints a warning if the index is a compile-time constant that is
out of range (i.e., negative or greater than or equal to the array's element count).
If the first expression is a pointer value, then the subscript is equivalent
to (pointer+index*sizeof(pointer^))^
. The compiler multiplies the index by the size of the pointer's
base type, adds it to the pointer, and dereferences the result. The pointer's base type cannot be empty.
{
nontype,
nontype,
...}
→ value
Orth compound literals are the same as C compound literals except that the type
isn't parenthesized. Type{expression}
is equivalent to (Type anon:={expression})
except that the former is an r-value whereas the latter is an l-value.
,
typeless_integer → typeless integer
,
typeless_expression → typeless_expression
,
value → value
The comma operator (also known as the side-effect operator) evaluates the first operand, discards it, and evaluates the second operand. If the right operand is a typeless integer or typeless expression, then the compiler is unable to infer the right operands's type from the left operand. Instead it creates a placeholder, called a typeless expression, and generates instructions for the placeholder once it's able to infer the type from the comma expression's context. If the operand has no side effects, the compiler prints a warning. A side effect is any instruction that modifies memory (assignment and increment) or calls a subroutine.
true
?
type1 : type2 → type1
false
?
type1 : type2 → type2
true
?
typeless_integer1 : typeless_integer2 → typeless_integer1
false
?
typeless_integer1 : typeless_integer2 → typeless_integer2
?
typeless : typeless → typeless_expression
?
value : value → value
The form involving two types requires a constant boolean. If the first operand is a constant, the compiler evaluates it at compile time and replaces the conditional with the second operand (if the condition is true) or the third operand (if the condition is false).
If the second and third operands are typeless integers or expressions, the compiler is unable to infer their type. Instead, it creates a placeholder, called a typeless expression, and generates instructions for the placeholder once it's able to infer the type from the conditional expression's context. Otherwise, the compiler converts both operands to a common type and evalutes the conditional at runtime. The result is an l-value iff both operands are l-values.
!
bool → bool
The compiler converts the operand to a boolean and inverts it by xor'ing the value with 1. In practice, this means that all booleans must have the value 0 or 1 to function properly, which might become an issue when exchanging booleans with third-party libraries.
-
typeless_integer → typeless_integer
-
typeless_expression → typeless_expression
-
float → float
-
integer → integer
If the operand is a typeless integer, then the result is another typeless integer, and an overflow occurs only if the operand is -0x1_0000_0000_0000_0000. If the operand is a typeless expression, then the result is another typeless expression. The compiler prints a warning if negating a constant signed integer causes an overflow. Since negating an unsigned integer produces an unsigned integer, overflow isn't possible for unsigned integers. The semantics of negation are the same as subtracting the value from zero.
~
typeless_integer → typeless_integer
~
typeless_expression → typeless_expression
~
bool → bool
~
integer → integer
If the operand is a typeless integer, the compiler treats it as a 65-bit two's complement value. The result is
negative if the operand is nonnegative and vice versa. An overflow is not possible.
The ~
and !
operators are identical for boolean operands.
++
float → float
--
float → float
++
→ float
--
→ float
++
integer → integer
--
integer → integer
++
→ integer
--
→ integer
++
pointer → pointer
--
pointer → pointer
++
→ pointer
--
→ pointer
Preincrement, predecrement, postincrement, and postdecrement (collectively called increments) require an l-value. Incrementing a floating-point value adds 1.0 to it and vice versa for decrement. Incrementing a pointer adds the size of the pointer's base type to its value and vice versa for decrement. An error occurs if the pointer's base type is empty.
++
expr is equivalent to expr+=1
and
--
expr is equivalent to expr-=1
. As such, preincrement and predecrement produce r-values.
==
Type → true
or false
!=
Type → true
or false
==
function → bool
!=
function → bool
==
bool → bool
!=
bool → bool
true
or false
op is one of ==
, !=
, <
, <=
, >
, or >=
.
Comparing two types with the ==
operator yields true
if the types exactly match and false
if they do not.
The !=
operator yields the opposite of the ==
operator. Other comparisons (<
, <=
, >
,
and >
=) are illegal for types.
To compare two typeless integers, the compiler treats them as 65-bit signed integers. Hence, a complemented nonnegative integer
is automatically less than an uncomplemented nonnegative value (e.g., ~2 < 1
). To compare a typeless integer as an unsigned integer use a conversion
to an unsigned type (e.g., ~uint(2) > uint(1)
). The compiler is unable to infer the type of a comparison involving two typeless expressions, so
it simply converts each operand to int
.
Pointers use the same comparisons as unsigned integers with null
being less than every nonnull value.
Functions are equal if their addresses match. Booleans are equal if their bits exactly match. Orth defines 0
as false
, 1 as true
, and all other values as indeterminate. In practice, this means that a third-party library
that represents true
as 0xFF won't be able to communicate with an Orth program.
&&
bool → bool
||
bool → bool
Logical-and and logical-or (collectively called the logical operators) are identical to their C/C++ counterparts. The compiler converts both operands to booleans and
"short-circuits" the execution of the second operand if the first operand is false
(in the case of &&
) or true
(in the case of ||
).
+
typeless_integer → typeless_integer
+
typeless → typeless_expression
+
word → pointer
+
number → number
If both operands are typeless integers, the result is a typeless integer that must fit in the 65-bit signed range
(i.e., [-0x1_0000_0000_0000_0000,0xffff_ffff_ffff_ffff]). Exceeding this range causes a compile-time error.
Otherwise, if both operands are typeless integers or expressions, the result is a typeless expression.
If the left operand is a pointer, the right operand is converted to a word, multiplied
by the size of the pointer's base type, and added to the left operand. The base type must not be empty.
If the left operand is a number, the compiler picks a common type for the two operands and adds them.
The compiler attempts to trap overflow at compile time. Overflow occurs if the sum exceeds the N-bit signed range
(if the common type is a signed N-bit integer) or the N+1-bit signed range (if the common type is an N-bit unsigned integer),
or the double-precision floating-point range (if the common type is double
).
-
typeless_integer → typeless_integer
-
typeless → typeless_expression
-
number → number
-
word → pointer
-
pointer → word
Subtraction is analogous to addition except for the additional form, which converts the two pointer operands to a common type, subtracts them, and divides the difference by the size of the pointer's base type. The base type cannot be empty. The result is a word.
*
typeless_integer → typeless_integer
*
typeless → typeless_expression
*
number → number
Multiplication is analogous to addition.
/
typeless_integer → typeless_integer
/
typeless → typeless_expression
/
number → number
Integer division uses the same rounding rules as C/C++. Division by zero generates a compile-time error if possible (i.e., the right operand is an typeless integer or a constant).
%
typeless_integer → typeless_integer
%
typeless → typeless_expression
%
integer → integer
Integer division uses the same rounding rules as C/C++. Division by zero generates a compile-time error if possible (i.e., the right operand is an typeless integer or a constant). The remainder is not available for floating point.
@
typeless_integer → typeless_integer
@
typeless → typeless_expression
@
integer → integer
@
bool → bool
Bitwise operations cannot overflow for operands in any form (typeless or typed). If both operands are typeless integers, the compiler treats each operand as a 65-bit two's complement value. The result is negative if either operand (but not both operands) are negative. Otherwise, if both operands are typeless integers or expressions, the result is a typeless expression whose type depends on the surrounding context.
If the left operand is a boolean, the right operand is converted to a boolean, and the result is true if either operand (but not both operands) is true. Since the compiler uses an 8-bit XOR instruction, the operands must be in "canonical" form to work properly (i.e., 0x00 for false and 0x01 for true). Otherwise, the compiler converts both operands to a common type.
Bitwise XOR uses @
instead of ^
to avoid a syntactic ambiguity with the pointer operator.
&
typeless_integer → typeless_integer
&
typeless → typeless_expression
&
integer → integer
&
bool → bool
Bitwise AND is analogous to bitwise XOR. The resulting typeless integer is negative if and only if both typeless-integer operands are negative. If the left operand is a boolean that evaluates to false, the right operand is still evaluated.
|
typeless_integer → typeless_integer
|
typeless → typeless_expression
|
integer → integer
|
bool → bool
Bitwise OR is analogous to bitwise XOR. The resulting typeless integer is negative if and only if either typeless-integer operand is negative. If the left operand is a boolean that evaluates to true, the right operand is still evaluated.
<<
typeless_integer → typeless_integer
<<
constant_integer → typeless_integer
<<
int1 → typeless_expression
<<
int1 → integer
1signed 32-bit integer
If the first operand is a typeless integer and the second operand is a typeless integer or a constant int, the result is the left operand multiplied by two raised to the right operand. The right operand must be nonnegative. If the result doesn't fit within the signed 65-bit range, then compiler prints an error. Shifting zero by an arbitrarily large positive value always produces zero.
If the first operand is a typeless integer or expression and the compiler is unable to fold the shift (as above) then the result is a typeless expression whose type depends on the surrounding context.
Unlike the other arithmetic operators, shifting doesn't require the operands to have the same type. The compiler promotes the
left operand to an integer (i.e., 1- and 2-byte integer types become int
; the others are unchanged) and
converts the right operand to an int
.
Shifting by a negative number of bits causes a compiler error (if possible) or produces zero at runtime (because the code generator uses an unsigned shift amount). A shifted N-bit constant must fit within the N-bit unsigned range. That is, the shift can cause a negative value to become positive or vice versa, but the value must not lose any significant bits (i.e., you can undo the left shift with a matching right shift):
const int FIFTEEN:=15 const int foo:=FIFTEEN<<28 //ok, result is negative but no information was lost const int bar:=FIFTEEN<<29 //error, top bit was lost |
Shifting a value left by N bits yields the same result as shifting it left by 1 bit N times. In order words, left shifting an N-bit variable by N or more bits makes it zero. The compiler prints a warning if the right operand is a constant that exceeds the number of bits in the left operand.
>>
typeless_integer → typeless_integer
>>
constant_integer → typeless_integer
>>
int1 → typeless_expression
>>
int1 → integer
1signed 32-bit integer
If the first operand is a typeless integer and the second operand is a typeless integer or a constant int, the result is the left operand divided by two raised to the right operand rounded down. The right operand must be nonnegative. Right shifting a typeless or typed integer cannot overflow.
If the first operand is a typeless integer or expression and the compiler is unable to fold the shift (as above) then the result is a typeless expression whose type depends on the surrounding context.
Unlike the other arithmetic operators, shifting doesn't require the operands to have the same type. The compiler promotes the
left operand to an integer (i.e., 1- and 2-byte integer types become 'int'; the others are unchanged) and
converts the right operand to an int
.
Shifting by a negative number of bits causes a compiler error (if possible) or evaluates to zero at runtime (because the code generator uses an unsigned shift amount).
Shifting an N-bit signed value to the right by N or more bits copies the sign bit to every other bit. That is, the value becomes zero if it was positive and -1 if it was negative. Shifting an N-bit unsigned value to the right by N or more bits makes it zero. The compiler prints a warning if the right operand is a constant that exceeds the number of bits in the left operand.
For typed integers, the shift is arithmetic if and only if the left operand is signed.
:=
nontype → value
The compiler converts the right operand to the left operand's type. The left operand must be an l-value. If the left operand is an empty type (e.g., void), then the assignment is a NOP yielding a trivial instance of the empty type. Otherwise, the assignment copies the bits from the right operand into the left operand and yields the left operand as an r-value.
+=
word → pointer
+=
number → number
-=
word → pointer
-=
number → number
*=
number → number
/=
number → number
%=
integer → integer
@=
bool → bool
@=
integer → integer
&=
bool → bool
&=
integer → integer
|=
bool → bool
|=
integer → integer
<<=
int1 → integer
>>=
int1 → integer
1signed 32-bit integer
The left operand must be an l-value. For division and remainder, the compiler converts both operands to a common type. For left shift and right shift, the compiler converts the right operand to a 32-bit signed integer. For the remaining operators, the compiler converts the right operand to the left operand's type (if the left operand is a number) or to a word (if the left operand is a pointer).
The semantics of each operator is the same as the corresponding arithmetic operator. The compiler attempts to trap invalid operations at compile time (division by zero, shifting by a negative number of bits, and shifting by too many bits). Unlike C/C++, each operator yields the left operand as an r-value.
Each statement must appear inside a function's body or the body of another statement. Aggregates and the global context cannot contain statements.
scope
body
The scope
statement is identical to an if(true)
statement.
if(
condition )
body
if(
condition )
body else
body
The semantics are the same as C/C++. The compiler converts the condition to a boolean and executes the
first body if the condition is true or the optional second body if the condition is false. The compiler
optimizes away the if
statement if the condition is a constant, making the if
statement a suitable replacement
for the C/C++ #ifdef
/#endif
directive.
The compiler creates up to three scopes for an if
statement: one for each body and one for the whole statement.
Each body can access declarations inside the condition but statements above and below the if
statement cannot.
Declarations in one body are invisible to the other body.
int k:=i //error if( (int i:=foo()) > (int j:=bar()) ) return i else return j int k:=j //error |
select(
selector )
body
The statement's body must contain only case
statements and up to one else
statement:
case(
value1 ,
value2 ,
... )
body
else
body
The select
statement is a specialized version of the C/C++ switch
statment. The purpose is to select one matching case or the default case
by comparing the selector with the value (or values) of each case. Although you can achieve the same semantics with an if
, else if
, else if
, ..., else
chain, a select
statement is more concise and usually more efficient. The compiler optimizes away the select
statement if the condition is a constant,
making select
a suitable replacement for the C/C++ #if
...#elif
...#endif
directive.
The selector must be a number (or, in the future, must be convertible to a number), which undergoes a unary promotion.
Unlike C/C++, the selector can be a floating-point value. Each case value must be a constant. The compiler converts each case value
to the selector's type. Duplicate values are an error. Missing values are currently not an error even if the else
clause is not present (this will probably change for
enumerated types). The else
clause need not be reachable (i.e., all values are handled).
The scope rules are the same as the if
statment. Each case can access declarations in the selector but statements above and below the select
statment cannot.
Declarations inside one case (its values and body) are invisible to other cases.
The order of the cases and else
statement doesn't matter. Multiple else
statements are an error. Statements besides case
and else
are illegal.
A case value can be a closed or half-open range. A closed range uses two periods (e.g., 1..2
means [1,2]). A half-open range uses two periods and a less-than sign
(e.g., 1..<3
means [1,3)).
A range of values is semantically equivalent to a comma-separated list containing every value in the range.
In particular, ranges cannot overlap. The range is unsigned if and only if the selector is unsigned. For example, the range 0xffff_ffff..0
contains two
values, -1 and 0, if the selector's type is int
and is invalid if the selector's type is uint
. Similarly, the range -2..-1
contains two
values, 0xffff_fffe and 0xffff_ffff, if the selector's type is uint
and is invalid if the selector's type is int
. A floating-point range that includes 0.0
automatically includes +0.0 and -0.0. Handling +0.0 in one case and -0.0 in a different case is impossible. A range that includes a NAN or INF follows
the IEEE754 ordering (i.e, -NAN < -INF < -X < 0 < +X < +INF < +NAN ).
Implementation Note: The compiler implements the select
as a binary search and/or one or more lookup tables.
while(
condition )
body
The semantics are the same as C/C++. The compiler converts the condition to a boolean and executes the
body so long as the condition is true. The body can access declarations in the condition, but statements
above and below the while
statement cannot. Unlike an if
statement, the condition
is able to access declarations in the body.
while((int x:=foo())>0) bar(x) //ok bool first:=true while(first || x==0) //ok, we initialize x before testing its value (the compiler might { //nonetheless issue a warning) first:=false int x:=foo() } while(foo(&x)) //ok, we can evaluate the address of a local variable before int x:=bar() //initializing it |
do
body while(
condition )
The semantics of the do
statement are the same as the while
statement except that the compiler evaluates
the condition after executing the body. The condition can access declarations in the body, but statements
above and below the do
statement cannot.
do int x:=foo() while(x>0) //ok bool first:=true do { foo(first || x==0) //ok, we initialize x before testing its value (the compiler might first:=false //nonetheless issue a warning) } while((int x:=foo)>0) |
for(
initializer ;
condition ;
increment )
body
The initializer, condition, and increment are optional. A missing condition is equivalent to true
.
Otherwise, the compiler converts the condition to a boolean. The increment, if present, must have a side effect.
A side effect is any instruction that modifies memory (assignment and increment) or calls a subroutine.
Declarations in each part of the statement (initializer, condition, increment, and body) are visible to every other part of the statement and invisible to statements outside of the for statement. Semantically, the for statement is equivalent to:
scope { initializer top:: if(!condition) //If the condition is true, the condition's finalizers execute after the goto bottom //body and increment execute body //The body's finalizers execute after the increment executes increment goto top bottom:: } |
::
::
statement
A label refers to the following statement in the same scope (if any) that isn't an attribute. Placing the label above a statement or in front of it
makes no difference. A label is the target of a break
or continue
(if the label precedes a loop statement) or a goto
.
All labels in a particular function must have a unique name and must not conflict with other declarations in the function (see Scopes).
void foo() { scope { label:: //label refers to the call to foo() int f:=foo() label2:: //label2 doesn't refer to any statements because bar() belongs to the } //enclosing scope bar() scope label:: //error, foo already contains a declaration for label goto label //ok, label is visible even though f isn't } |
goto
label
The label must be visible (i.e., must be in the same function as the goto). In future versions of Orth, a goto
will not be able to skip the initialization
of objects that have finalizers (otherwise, the compiler would have to remember whether to call the destructor).
break
continue
break
label
continue
label
The specified label must precede an enclosing for
, do
, or while
statement. If the label isn't present, the statement refers
to the nearest for
, do
, or while
statement.
return
return
result
The first form is valid only if the result type is void
. The second form is valid for all result types (including void
).
Like C++, returning an instance of void
is a valid NOP.