Property-Based Testing
- Unit tests check specific examples; property-based tests check universal statements that must hold for any generated input.
- A generator is a function from a seed to a value and a new seed, enabling reproducible pseudo-random generation.
- Composing generators with
list_ofand other combinators builds generators for complex types from simple ones. checkruns a property against many generated values and returns the first counterexample it finds.- Shrinking reduces a failing input to the smallest version that still triggers the failure; this lesson explains the concept without implementing it.
Why Properties?
- A test like
word_count(["the", "the"]) |> should.equal(dict.from_list([#("the", 2)]))checks one input - A property like "for all word lists, the total count equals the list length" checks an infinite family of inputs
- Property-based testing was popularized by Haskell's QuickCheck; today every major language has a property testing library
- Properties catch bugs that no one thought to add as an example: the sorting code that fails for lists of length 1, the encoder that drops the last byte when the input length is a multiple of 8
- This lesson builds a minimal property tester from scratch to expose the mechanics; real projects would use an established library
Seeds and the Generator Type
// A seed for the pseudo-random number generator.
// Any integer is a valid seed.
pub type Seed {
Seed(Int)
}
// A generator produces a value and a new seed.
pub type Gen(a) =
fn(Seed) -> #(a, Seed)
// Advance the seed using a linear congruential generator.
// Multiplier and increment are from Numerical Recipes.
pub fn next(seed: Seed) -> #(Int, Seed) {
let Seed(n) = seed
let raw = n * 1_664_525 + 1_013_904_223
let next_n = case raw < 0 {
True -> -raw
False -> raw
}
#(next_n, Seed(next_n))
}
- A
Seedwraps a single integer; any integer is a valid starting seed nextadvances the seed using a linear congruential generator (LCG): multiply and add, then take the absolute value to stay positive- Because Gleam runs on the BEAM and Erlang uses arbitrary-precision integers,
there is no overflow; the numbers just grow until
absbrings them back Gen(a)is a type alias for a function that takes a seed and returns a generated value paired with the next seed- The seed-threading style means generators are pure and reproducible: the same seed always produces the same sequence
Basic Generators
pub fn int_between(lo: Int, hi: Int) -> Gen(Int) {
fn(seed) {
let #(raw, next_seed) = next(seed)
let range = hi - lo + 1
let val = lo + raw % range
#(val, next_seed)
}
}
pub fn list_of(gen: Gen(a), count: Int) -> Gen(List(a)) {
fn(seed) { do_list_of(gen, count, seed, []) }
}
fn do_list_of(
gen: Gen(a),
remaining: Int,
seed: Seed,
acc: List(a),
) -> #(List(a), Seed) {
case remaining {
0 -> #(list.reverse(acc), seed)
_ -> {
let #(val, next_seed) = gen(seed)
do_list_of(gen, remaining - 1, next_seed, [val, ..acc])
}
}
}
int_between(lo, hi)callsnextto get a raw integer, then maps it into the range[lo, hi]using%list_of(gen, count)callsgenexactlycounttimes, threading the seed forward on each call, and collects the values into a listdo_list_ofis the tail-recursive worker: it prepends values toaccand reverses at the end, keeping each step O(1)- Composing generators is the normal way to build generators for complex types:
list_of(int_between(-100, 100), 10)is a generator for ten-element lists of integers
The Check Function
// Run a property against `trials` generated values starting from `seed`.
// Returns Ok(Nil) if all trials pass, or Error(failing_value) on the first failure.
pub fn check(
gen: Gen(a),
prop: fn(a) -> Bool,
trials: Int,
seed: Seed,
) -> Result(Nil, a) {
do_check(gen, prop, trials, seed)
}
fn do_check(
gen: Gen(a),
prop: fn(a) -> Bool,
remaining: Int,
seed: Seed,
) -> Result(Nil, a) {
case remaining {
0 -> Ok(Nil)
_ -> {
let #(val, next_seed) = gen(seed)
case prop(val) {
True -> do_check(gen, prop, remaining - 1, next_seed)
False -> Error(val)
}
}
}
}
checkruns the property againsttrialsgenerated valuesdo_checkis the recursive worker: generate a value, test the property, recurse if it passes, returnError(val)if it fails- Returning
Error(val)rather than justError(Nil)lets the caller see the specific counterexample that broke the property - A real property testing library would also shrink the counterexample:
repeatedly simplify
valuntil no simpler version still fails, giving a minimal, readable test case
Running the Example
let seed = Seed(42)
let int_gen = int_between(-50, 50)
let list_gen = list_of(int_between(-100, 100), 8)
let reverse_twice =
check(list_gen, fn(xs) { list.reverse(list.reverse(xs)) == xs }, 200, seed)
io.println("reverse twice: " <> string.inspect(reverse_twice))
let sort_idempotent = check(
list_gen,
fn(xs) {
let sorted = list.sort(xs, int.compare)
list.sort(sorted, int.compare) == sorted
},
200,
seed,
)
io.println("sort idempotent: " <> string.inspect(sort_idempotent))
// This property is false: not every integer is non-negative.
let always_positive = check(int_gen, fn(n) { n >= 0 }, 200, seed)
io.println(
"all non-negative (should fail): " <> string.inspect(always_positive),
)
reverse_twicepasses becauselist.reverse(list.reverse(xs)) == xsfor every listsort_idempotentpasses because sorting an already-sorted list is a no-opalways_positivefails immediately: the generator produces both positive and negative integers, so the first negative value that appears is the counterexample
Testing
pub fn next_changes_seed_test() {
let #(_, s2) = next(Seed(1))
let #(_, s3) = next(s2)
s2 |> should.not_equal(s3)
}
pub fn int_between_in_range_test() {
let gen = int_between(1, 10)
let #(val, _) = gen(Seed(99))
{ val >= 1 && val <= 10 } |> should.be_true()
}
pub fn int_between_deterministic_test() {
let gen = int_between(0, 100)
let #(a, _) = gen(Seed(7))
let #(b, _) = gen(Seed(7))
a |> should.equal(b)
}
pub fn list_of_length_test() {
let gen = list_of(int_between(0, 10), 5)
let #(result, _) = gen(Seed(1))
list.length(result) |> should.equal(5)
}
pub fn reverse_twice_is_identity_test() {
let gen = list_of(int_between(-100, 100), 10)
check(gen, fn(xs) { list.reverse(list.reverse(xs)) == xs }, 100, Seed(42))
|> should.equal(Ok(Nil))
}
pub fn sort_idempotent_test() {
let gen = list_of(int_between(-100, 100), 10)
check(
gen,
fn(xs) {
let sorted = list.sort(xs, int.compare)
list.sort(sorted, int.compare) == sorted
},
100,
Seed(42),
)
|> should.equal(Ok(Nil))
}
pub fn check_finds_counterexample_test() {
// Claiming every integer in [0..10] is > 5 is false.
let gen = int_between(0, 10)
check(gen, fn(n) { n > 5 }, 200, Seed(1))
|> should.be_error()
}
pub fn check_passes_when_true_test() {
let gen = int_between(1, 100)
check(gen, fn(n) { n > 0 }, 200, Seed(42))
|> should.equal(Ok(Nil))
}
next_changes_seed_testconfirms that the generator advances: applyingnexttwice gives two different seedsint_between_in_range_testconfirms the range constraint holds for one seedint_between_deterministic_testconfirms that the same seed always produces the same value, which is the reproducibility guaranteecheck_finds_counterexample_testverifies thatcheckreturnsErrorfor a property that is demonstrably false
Check Understanding
Why does check return Error(val) where val is the failing input,
rather than just Error("property failed")?
Knowing which value triggered the failure is the whole point of property testing.
A bare error message tells you the test failed;
the failing value tells you what to investigate.
In a real library, you would then pass that value to a shrinker
to find the minimal failing case.
Returning Error(val) also keeps the return type polymorphic in a,
meaning check works for any generator without needing a separate error type.
The int_between(0, 10) generator uses raw % range to stay in bounds.
What goes wrong if lo > hi?
range = hi - lo + 1 becomes zero or negative.
Division by zero in Erlang raises a badarith exception,
crashing the process.
A production generator would validate that lo <= hi and return Error
or panic with a useful message.
For this lesson, the contract is that lo <= hi;
violating it is a programming error, not a runtime event to handle.
Exercises
String generator (15 minutes)
Write string_of(chars: List(String), len: Int) -> Gen(String)
that generates a string of exactly len characters chosen randomly from chars.
Use list_of and int_between(0, list.length(chars) - 1) as the character picker.
Write a property test confirming that every generated string has the right length.
Roundtrip property (15 minutes)
Using int_between, write a property test for the encode/decode roundtrip
from the binary lesson:
for any integer in the range that the packer supports,
decode(encode(n)) == n.
Run 500 trials.
Shrinking (20 minutes)
Add a shrink_int(n: Int) -> List(Int) function that returns a list of smaller
candidate values to try: [0, n / 2, n - 1]
(filtered to remove duplicates and values equal to n).
Modify do_check to call shrink_int on the first failing value
and keep shrinking as long as smaller values also fail,
returning the smallest failing value found.
Test that a property like n < 3 produces a counterexample of exactly 3.
Pair generator (10 minutes)
Write pair_of(gen_a: Gen(a), gen_b: Gen(b)) -> Gen(#(a, b))
that generates a pair by running both generators in sequence,
threading the seed through.
Use it to write a property test confirming that dict.merge(a, b)
always contains at least as many keys as dict.size(a).