Property-Based Testing

Unit tests check specific examples; property-based tests check universal statements that must hold for any generated input.
A generator is a function from a seed to a value and a new seed, enabling reproducible pseudo-random generation.
Composing generators with list_of and other combinators builds generators for complex types from simple ones.
check runs a property against many generated values and returns the first counterexample it finds.
Shrinking reduces a failing input to the smallest version that still triggers the failure; this lesson explains the concept without implementing it.

Why Properties?

A test like let result = word_count(["the", "the"]); let assert True = result == dict.from_list([#("the", 2)]) checks one input
A property like "for all word lists, the total count equals the list length" checks an infinite family of inputs
Property-based testing was popularized by Haskell's QuickCheck; today every major language has a property testing library
Properties catch bugs that no one thought to add as an example: the sorting code that fails for lists of length 1, the encoder that drops the last byte when the input length is a multiple of 8
This lesson builds a minimal property tester from scratch to expose the mechanics; real projects would use an established library

Seeds and the Generator Type

// A seed for the pseudo-random number generator.
// Any integer is a valid seed.
pub type Seed {
  Seed(Int)
}

// A generator produces a value and a new seed.
pub type Gen(a) =
  fn(Seed) -> #(a, Seed)

// Advance the seed using a linear congruential generator.
// Multiplier and increment are from Numerical Recipes.
pub fn next(seed: Seed) -> #(Int, Seed) {
  let Seed(n) = seed
  let raw = n * 1_664_525 + 1_013_904_223
  let next_n = case raw < 0 {
    True -> -raw
    False -> raw
  }
  #(next_n, Seed(next_n))
}

A Seed wraps a single integer; any integer is a valid starting seed
next advances the seed using a linear congruential generator (LCG): multiply and add, then take the absolute value to stay positive
Because Gleam runs on the BEAM and Erlang uses arbitrary-precision integers, there is no overflow; the numbers just grow until abs brings them back
Gen(a) is a type alias for a function that takes a seed and returns a generated value paired with the next seed
The seed-threading style means generators are pure and reproducible: the same seed always produces the same sequence

Basic Generators

pub fn int_between(lo: Int, hi: Int) -> Gen(Int) {
  fn(seed) {
    let #(raw, next_seed) = next(seed)
    let range = hi - lo + 1
    let val = lo + raw % range
    #(val, next_seed)
  }
}

pub fn list_of(gen: Gen(a), count: Int) -> Gen(List(a)) {
  fn(seed) { do_list_of(gen, count, seed, []) }
}

fn do_list_of(
  gen: Gen(a),
  remaining: Int,
  seed: Seed,
  acc: List(a),
) -> #(List(a), Seed) {
  case remaining {
    0 -> #(list.reverse(acc), seed)
    _ -> {
      let #(val, next_seed) = gen(seed)
      do_list_of(gen, remaining - 1, next_seed, [val, ..acc])
    }
  }
}

int_between(lo, hi) calls next to get a raw integer, then maps it into the range [lo, hi] using %
list_of(gen, count) calls gen exactly count times, threading the seed forward on each call, and collects the values into a list
do_list_of is the tail-recursive worker: it prepends values to acc and reverses at the end, keeping each step O(1)
Composing generators is the normal way to build generators for complex types: list_of(int_between(-100, 100), 10) is a generator for ten-element lists of integers

The Check Function

// Run a property against `trials` generated values starting from `seed`.
// Returns Ok(Nil) if all trials pass, or Error(failing_value) on the first failure.
pub fn check(
  gen: Gen(a),
  prop: fn(a) -> Bool,
  trials: Int,
  seed: Seed,
) -> Result(Nil, a) {
  do_check(gen, prop, trials, seed)
}

fn do_check(
  gen: Gen(a),
  prop: fn(a) -> Bool,
  remaining: Int,
  seed: Seed,
) -> Result(Nil, a) {
  case remaining {
    0 -> Ok(Nil)
    _ -> {
      let #(val, next_seed) = gen(seed)
      case prop(val) {
        True -> do_check(gen, prop, remaining - 1, next_seed)
        False -> Error(val)
      }
    }
  }
}

check runs the property against trials generated values
do_check is the recursive worker: generate a value, test the property, recurse if it passes, return Error(val) if it fails
Returning Error(val) rather than just Error(Nil) lets the caller see the specific counterexample that broke the property
A real property testing library would also shrink the counterexample: repeatedly simplify val until no simpler version still fails, giving a minimal, readable test case

Running the Example

  let seed = Seed(42)
  let int_gen = int_between(-50, 50)
  let list_gen = list_of(int_between(-100, 100), 8)

  let reverse_twice =
    check(list_gen, fn(xs) { list.reverse(list.reverse(xs)) == xs }, 200, seed)
  io.println("reverse twice: " <> string.inspect(reverse_twice))

  let sort_idempotent =
    check(
      list_gen,
      fn(xs) {
        let sorted = list.sort(xs, int.compare)
        list.sort(sorted, int.compare) == sorted
      },
      200,
      seed,
    )
  io.println("sort idempotent: " <> string.inspect(sort_idempotent))

  // This property is false: not every integer is non-negative.
  let always_positive = check(int_gen, fn(n) { n >= 0 }, 200, seed)
  io.println(
    "all non-negative (should fail): " <> string.inspect(always_positive),
  )

reverse_twice passes because list.reverse(list.reverse(xs)) == xs for every list
sort_idempotent passes because sorting an already-sorted list is a no-op
always_positive fails immediately: the generator produces both positive and negative integers, so the first negative value that appears is the counterexample

Testing

pub fn next_changes_seed_test() {
  let #(_, s2) = next(Seed(1))
  let #(_, s3) = next(s2)
  s2 |> should.not_equal(s3)
}

pub fn int_between_in_range_test() {
  let gen = int_between(1, 10)
  let #(val, _) = gen(Seed(99))
  { val >= 1 && val <= 10 } |> should.be_true()
}

pub fn int_between_deterministic_test() {
  let gen = int_between(0, 100)
  let #(a, _) = gen(Seed(7))
  let #(b, _) = gen(Seed(7))
  a |> should.equal(b)
}

pub fn list_of_length_test() {
  let gen = list_of(int_between(0, 10), 5)
  let #(result, _) = gen(Seed(1))
  list.length(result) |> should.equal(5)
}

pub fn reverse_twice_is_identity_test() {
  let gen = list_of(int_between(-100, 100), 10)
  check(gen, fn(xs) { list.reverse(list.reverse(xs)) == xs }, 100, Seed(42))
  |> should.equal(Ok(Nil))
}

next_changes_seed_test confirms that the generator advances: applying next twice gives two different seeds
int_between_in_range_test confirms the range constraint holds for one seed
int_between_deterministic_test confirms that the same seed always produces the same value, which is the reproducibility guarantee
check_finds_counterexample_test verifies that check returns Error for a property that is demonstrably false

Check Understanding

Why does check return Error(val) where val is the failing input, rather than just Error("property failed")?

Knowing which value triggered the failure is the whole point of property testing. A bare error message tells you the test failed; the failing value tells you what to investigate. In a real library, you would then pass that value to a shrinker to find the minimal failing case. Returning Error(val) also keeps the return type polymorphic in a, meaning check works for any generator without needing a separate error type.

The int_between(0, 10) generator uses raw % range to stay in bounds. What goes wrong if lo > hi?

range = hi - lo + 1 becomes zero or negative. Division by zero in Erlang raises a badarith exception, crashing the process. A production generator would validate that lo <= hi and return Error or panic with a useful message. For this lesson, the contract is that lo <= hi; violating it is a programming error, not a runtime event to handle.

Exercises

String generator (15 minutes)

Write string_of(chars: List(String), len: Int) -> Gen(String) that generates a string of exactly len characters chosen randomly from chars. Use list_of and int_between(0, list.length(chars) - 1) as the character picker. Write a property test confirming that every generated string has the right length.

Roundtrip property (15 minutes)

Using int_between, write a property test for the encode/decode roundtrip from the binary lesson: for any integer in the range that the packer supports, decode(encode(n)) == n. Run 500 trials.

Shrinking (20 minutes)

Add a shrink_int(n: Int) -> List(Int) function that returns a list of smaller candidate values to try: [0, n / 2, n - 1] (filtered to remove duplicates and values equal to n). Modify do_check to call shrink_int on the first failing value and keep shrinking as long as smaller values also fail, returning the smallest failing value found. Test that a property like n < 3 produces a counterexample of exactly 3.

Pair generator (10 minutes)

Write pair_of(gen_a: Gen(a), gen_b: Gen(b)) -> Gen(#(a, b)) that generates a pair by running both generators in sequence, threading the seed through. Use it to write a property test confirming that dict.merge(a, b) always contains at least as many keys as dict.size(a).