4.7. Basic Types

4.7.1. Scalar Types

4.7.1.1. Integer Types

There are signed and unsigned integers for various bitsizes

Byte: 8 bit unsigned integer
Int32, UInt32: 32 bit signed and unsigned integer
Char: 32 bit unicode code point
Int64, UInt64: 64 bit signed and unsigned integer
Int, UInt: architecture dependent signed and unsigned integer

4.7.1.1.1. Semantics

The semantics of builtin unsigned and signed integers is defined via an embedding into ℕ or ℤ. This embedding is defined by an embedding function and a proof that it is an embedding (i.e. it is injective).

In the following we show the necessary definitions for UInt32.

UInt32.toNatural:   UInt32 → ℕ
UInt32.fromNatural: ℕ → UInt32        -- modulo 2^32

UInt32.embedded: ∀ n: fromNatural (toNatural n) = n
UInt32.embedded: ∀ n m: toNatural n = toNatural m → n = m

UInt32.(≤) (n m: UInt32): Prop :=
    toNatural n ≤ toNatural m

UInt32.(≤?) (n m: UInt32): Bool

Unit32.bitSize: ℕ      -- bitsize is 'n + 1', cannot be zero

UInt32.(+) (n m: UInt32): UInt32 :=
    fromNatural (toNatural n + toNatural m)

UInt32.(-) (n m: UInt32): UInt32 :=
    fromNatural (toNatural n + 2^(succ bitsize)- toNatural m)

4.7.1.1.2. Compile to Javascript

For the node platform and the browser, scalar values up to the bitsize of 32 can be represented as javascript numbers. 64 bit scalars have no direct representation in javascript. We have to generate an object with two 32 bit sized numbers.

This workaround is necessary although javascript numbers are 64 bit floating point values. However it is not possible to do 64 bit integer arithmetic in javascript on 64 bit floating point values.

With the x|0 annotation we can force javascript to do signed 32 bit integer arithmetics on javascript numbers. The expression x >> 0 converts 32 bit integer as well. x >>> 0 converts to an unsigned 32 bit integer (i.e. -1 >>> 0 is converted to 0xff_ff_ff_ff).

Signed and unsigned integer arithmetic is the same. Only the javascript comparison operators <=, <, … give wrong results. Before doing the comparisons, it is necessary to add the lowest negative number 0x8000_0000 which is \(-2^{31}\). This shifts the number zero to the lowest negative number, i.e. all other numbers are greater or equal to this number.

4.7.1.1.3. Compile to Machine Code

If compiling to machine code (e.g. via LLVM or Rust) the situation is different.

Scalar types can be allocated on the stack. This is possible to bitsizes up to 128 (or maybe in LLVM even more).

The code is fastest if all scalar objects are allocated on the stack and scalar objects within other objects are completely within the surrounding object. I.e. there are no pointers to scalar objects (they are unboxed). This creates two possible problems:

Garbage collection:

Pointer occupy a machine word and the machine number occupies a machine word as well. The runtime cannot distinguish between a machine number and a pointer into the heap.

Ocaml resolved this problem by making the machine numbers of size \(2^{31}\) or \(2^{63}\) and representing the number \(i\) by the number \(2i + 1\). Therefore in machine numbers the least significant bit has always the value 1. Since heap locations are always word aligned the corresponding pointers have a least significant bit of 0. The garbage collector can recognize pointer into the heap by looking at the least significant digit.

Polymorphic Functions:

Generic functions on objects pointing into the heap need only one machine code representation for all its possible types.

The most efficient and closest to compilable mainstream languages would be to represent all scalar types which can fit into a machine word by the corresponding machine word and represent scalar types which cannot fit into a machine word (e.g. Int64 on 32 bit machines) by pointer to a boxed value on the heap.

Polymorphic arrays are then always sequences of machine words. Character arrays on 64 bit machines need 64 bits per character (however strings remain packed).

The garbage collector needs type information. It cannot know by just looking at a word if it represents a scalar value or a pointer into the heap. It has to know the layout of each stack frame and the layout of all objects on the heap.

Note

More detailed analysis needed!