4.7. Basic Types
4.7.1. Scalar Types
4.7.1.1. Integer Types
There are signed and unsigned integers for various bitsizes
Byte8 bit unsigned integer
Int32, UInt3232 bit signed and unsigned integer
Char32 bit unicode code point
Int64, UInt6464 bit signed and unsigned integer
Int, UIntarchitecture dependent signed and unsigned integer
4.7.1.1.1. Semantics
The semantics of builtin unsigned and signed integers is defined via an embedding into ℕ or ℤ. This embedding is defined by an embedding function and a proof that it is an embedding (i.e. it is injective).
In the following we show the necessary definitions for UInt32.
UInt32.toNatural: UInt32 → ℕ
UInt32.fromNatural: ℕ → UInt32 -- modulo 2^32
UInt32.embedded: ∀ n: fromNatural (toNatural n) = n
UInt32.embedded: ∀ n m: toNatural n = toNatural m → n = m
UInt32.(≤) (n m: UInt32): Prop :=
toNatural n ≤ toNatural m
UInt32.(≤?) (n m: UInt32): Bool
Unit32.bitSize: ℕ -- bitsize is 'n + 1', cannot be zero
UInt32.(+) (n m: UInt32): UInt32 :=
fromNatural (toNatural n + toNatural m)
UInt32.(-) (n m: UInt32): UInt32 :=
fromNatural (toNatural n + 2^(succ bitsize)- toNatural m)
4.7.1.1.2. Compile to Javascript
For the node platform and the browser, scalar values up to the bitsize of 32 can be represented as javascript numbers. 64 bit scalars have no direct representation in javascript. We have to generate an object with two 32 bit sized numbers.
This workaround is necessary although javascript numbers are 64 bit floating point values. However it is not possible to do 64 bit integer arithmetic in javascript on 64 bit floating point values.
With the x|0 annotation we can force javascript to do signed 32 bit integer
arithmetics on javascript numbers. The expression x >> 0 converts 32 bit
integer as well. x >>> 0 converts to an unsigned 32 bit integer (i.e. -1
>>> 0 is converted to 0xff_ff_ff_ff).
Signed and unsigned integer arithmetic is the same. Only the javascript
comparison operators <=, <, … give wrong results. Before doing the
comparisons, it is necessary to add the lowest negative number
0x8000_0000 which is \(-2^{31}\). This shifts the number zero to the
lowest negative number, i.e. all other numbers are greater or equal to this
number.
4.7.1.1.3. Compile to Machine Code
If compiling to machine code (e.g. via LLVM or Rust) the situation is different.
Scalar types can be allocated on the stack. This is possible to bitsizes up to 128 (or maybe in LLVM even more).
The code is fastest if all scalar objects are allocated on the stack and scalar objects within other objects are completely within the surrounding object. I.e. there are no pointers to scalar objects (they are unboxed). This creates two possible problems:
- Garbage collection:
Pointer occupy a machine word and the machine number occupies a machine word as well. The runtime cannot distinguish between a machine number and a pointer into the heap.
Ocaml resolved this problem by making the machine numbers of size \(2^{31}\) or \(2^{63}\) and representing the number \(i\) by the number \(2i + 1\). Therefore in machine numbers the least significant bit has always the value 1. Since heap locations are always word aligned the corresponding pointers have a least significant bit of 0. The garbage collector can recognize pointer into the heap by looking at the least significant digit.
- Polymorphic Functions:
Generic functions on objects pointing into the heap need only one machine code representation for all its possible types.
The most efficient and closest to compilable mainstream languages would be to
represent all scalar types which can fit into a machine word by the
corresponding machine word and represent scalar types which cannot fit into a
machine word (e.g. Int64 on 32 bit machines) by pointer to a boxed value on
the heap.
Polymorphic arrays are then always sequences of machine words. Character arrays on 64 bit machines need 64 bits per character (however strings remain packed).
The garbage collector needs type information. It cannot know by just looking at a word if it represents a scalar value or a pointer into the heap. It has to know the layout of each stack frame and the layout of all objects on the heap.
Note
More detailed analysis needed!