Virtual Machine
- A simple computer can be modelled as a record holding an instruction pointer, a register file, and a block of memory.
- Each instruction is a custom type variant.
- An assembler encodes variants into integers and an executor decodes and runs them.
- The fetch-decode-execute cycle is
a tail-recursive function over an immutable
Machinerecord. - Separating the assembler (pure encoding) from the executor (pure simulation) makes each part independently testable.
Why Build a VM?
- A virtual machine is a program that runs other programs
- Understanding one reveals how real CPUs work: fetch an instruction, decode its parts, execute the operation, repeat
- This lesson builds a minimal VM with four registers, a flat integer memory, and five instructions
- The VM connects several ideas covered earlier:
- Custom types and pattern matching (the instruction type)
- Recursion over immutable data (the execute loop)
- Bit manipulation (packing operands into integers)
- Separating pure logic from side effects (no IO inside
execute)
Types
- Two types capture the machine state and the instruction set:
pub type Op {
Halt
Load(Int, Int)
Add(Int, Int, Int)
Jump(Int)
JumpIfZero(Int, Int)
}
pub type Machine {
Machine(ip: Int, regs: List(Int), ram: List(Int))
}
Oplists every instruction; each variant carries its operandsLoad(reg, val)puts a literal value into a registerAdd(dst, a, b)storesregs[a] + regs[b]intoregs[dst]Jump(addr)sets the instruction pointer toaddrJumpIfZero(reg, addr)jumps only ifregs[reg] == 0Haltstops the machine
Machineholds the full CPU state:ipis the instruction pointer (index intoram)regsis the register file, aList(Int)of lengthnum_regsramis the program encoded as aList(Int)
num_regs = 4is a named constant- Using a name rather than a bare
4makes the intent clear - It appears in both the initialization and the tests without duplication
- Using a name rather than a bare
Assembler
- The assembler converts a list of
Opvalues to a list of integers:
pub fn assemble(ops: List(Op)) -> List(Int) {
list.map(ops, encode)
}
fn encode(op: Op) -> Int {
case op {
Halt -> 0
Load(reg, val) -> 1 * field_size * field_size + reg * field_size + val
Add(dst, a, b) -> 2 * field_size * field_size + dst * field_size + a * 16 + b
Jump(addr) -> 3 * field_size * field_size + addr
JumpIfZero(reg, addr) -> 4 * field_size * field_size + reg * field_size + addr
}
}
- Each instruction is packed into a single 24-bit integer
- Bits 23-16: opcode (0 = Halt, 1 = Load, 2 = Add, 3 = Jump, 4 = JumpIfZero)
- Bits 15-8: first operand
- Bits 7-0: second operand
Load(0, 10)encodes as1 * field_size * field_size + 0 * field_size + 10 = 65546field_size = 256is a named constant representing one byte per instruction field; using it instead of bare256makes the encoding scheme explicit
Executor
- The executor is a single recursive function:
pub fn execute(machine: Machine) -> Machine {
case list_at(machine.ram, machine.ip) {
Error(_) -> machine
Ok(encoded) -> {
let op = decode(encoded)
case op {
Halt -> machine
Load(reg, val) -> {
let new_regs = set_reg(machine.regs, reg, val)
execute(Machine(machine.ip + 1, new_regs, machine.ram))
}
Add(dst, a, b) -> {
let va = get_reg(machine.regs, a)
let vb = get_reg(machine.regs, b)
let new_regs = set_reg(machine.regs, dst, va + vb)
execute(Machine(machine.ip + 1, new_regs, machine.ram))
}
Jump(addr) ->
execute(Machine(addr, machine.regs, machine.ram))
JumpIfZero(reg, addr) -> {
case get_reg(machine.regs, reg) {
0 -> execute(Machine(addr, machine.regs, machine.ram))
_ -> execute(Machine(machine.ip + 1, machine.regs, machine.ram))
}
}
}
}
}
}
list_at(machine.ram, machine.ip)fetches the current instructionError(_)fires when the IP runs off the end: the machine haltsHaltstops execution cleanly and returns the current machine- Each other instruction builds a new
Machinewith updated state and recurses JumpIfZerobranches: if the register is zero, set IP toaddr, otherwise advance by one- This is tail recursion again:
every path either returns the machine or calls
executeas the last action- The Gleam compiler optimizes this to a loop, so even a long-running program will not overflow the call stack
Running the Example
let program = [
Load(0, 10),
Load(1, 20),
Add(2, 0, 1),
Halt,
]
let machine = Machine(ip: 0, regs: list.repeat(0, num_regs), ram: assemble(program))
let result = execute(machine)
io.println(string.inspect(result.regs))
Load(0, 10)puts 10 in register 0;Load(1, 20)puts 20 in register 1Add(2, 0, 1)storesregs[0] + regs[1]in register 2Haltstops execution; final register state is[10, 20, 30, 0]
The disassembler converts the encoded program back to readable text:
load r0 10
load r1 20
add r2 r0 r1
halt
Testing
pub fn add_program_test() {
let program = [Load(0, 5), Load(1, 3), Add(2, 0, 1), Halt]
let machine = Machine(ip: 0, regs: list.repeat(0, 4), ram: assemble(program))
let result = execute(machine)
at(result.regs, 2)
|> should.equal(Ok(8))
}
pub fn halt_at_start_test() {
let program = [Halt]
let machine = Machine(ip: 0, regs: list.repeat(0, 4), ram: assemble(program))
let result = execute(machine)
result.ip
|> should.equal(0)
}
pub fn jump_if_zero_skips_when_nonzero_test() {
// Load 1 into r0, then JumpIfZero r0 addr=0 should NOT jump
let program = [Load(0, 1), JumpIfZero(0, 0), Load(1, 42), Halt]
let machine = Machine(ip: 0, regs: list.repeat(0, 4), ram: assemble(program))
let result = execute(machine)
at(result.regs, 1)
|> should.equal(Ok(42))
}
add_program_testassembles and runs a simple addition program, then checks that register 2 holds the sumhalt_at_start_testconfirms thatHaltat IP 0 leaves the machine at IP 0jump_if_zero_skips_when_nonzero_testchecks thatJumpIfZerodoes not branch when the register is non-zero, allowing the next instruction to execute
Check Understanding
Why encode instructions as integers?
A real CPU stores programs as bytes in memory.
Using a List(Int) here mimics that:
the program and data live in the same address space,
and the instruction pointer is just an index.
This makes it straightforward to write self-modifying programs
(though doing so is a well-known source of bugs).
An alternative is to keep the List(Op) as the program representation,
which is simpler but means the program and data are separate.
This is less realistic but easier to reason about.
The execute function calls itself on every non-Halt instruction.
Why does Gleam not run out of stack space for a long program?
Because every recursive call to execute is a tail call:
execute is the last thing each branch does, with no further computation pending.
The Gleam compiler replaces tail calls with a jump back to the top of the function,
reusing the same stack frame.
This is equivalent to a while loop and uses O(1) stack space
regardless of how many instructions the program runs.
Exercises
Sum 1 to 5 (20 minutes)
Write a program in List(Op) that computes 1 + 2 + 3 + 4 + 5 = 15
using a loop and JumpIfZero.
Use register 0 as the counter (start at 5, decrement each iteration)
and register 1 as the accumulator.
Halt when the counter reaches zero.
Run it with execute and assert regs[1] == 15.
Disassembler (15 minutes)
Write disassemble(words: List(Int)) -> List(String) that converts
encoded words to strings like "load r0 10" and "add r2 r0 r1".
Add one test per instruction type.
Multiply instruction (15 minutes)
Add Mul(dst, a, b) to Op.
Choose an unused opcode (e.g. 5).
Update encode, decode, execute, and (if you wrote it) the disassembler.
Write a test that loads 6 and 7 into registers, multiplies them,
and asserts the result is 42.
Step limit (10 minutes)
Add a max_steps: Int field to Machine.
Decrement it on each instruction; when it reaches zero, halt.
Change the return type to Result(Machine, String).
Write a test that confirms a looping program returns an error after the
step limit is reached.