Property-Based Testing

Why Properties?

Seeds and the Generator Type

i
// A seed for the pseudo-random number generator.
// Any integer is a valid seed.
pub type Seed {
    Seed(Int)
}

// A generator produces a value and a new seed.
pub type Gen(a) =
    fn(Seed) -> #(a, Seed)
i
// Advance the seed using a linear congruential generator.
// Multiplier and increment are from Numerical Recipes.
pub fn next(seed: Seed) -> #(Int, Seed) {
    let Seed(n) = seed
    let raw = n * 1_664_525 + 1_013_904_223
    let next_n = case raw < 0 {
        True -> -raw
        False -> raw
    }
    #(next_n, Seed(next_n))
}

Basic Generators

i
pub fn int_between(lo: Int, hi: Int) -> Gen(Int) {
    fn(seed) {
        let #(raw, next_seed) = next(seed)
        let range = hi - lo + 1
        let val = lo + raw % range
        #(val, next_seed)
    }
}

pub fn list_of(gen: Gen(a), count: Int) -> Gen(List(a)) {
    fn(seed) { do_list_of(gen, count, seed, []) }
}

fn do_list_of(
    gen: Gen(a),
    remaining: Int,
    seed: Seed,
    acc: List(a),
) -> #(List(a), Seed) {
    case remaining {
        0 -> #(list.reverse(acc), seed)
        _ -> {
            let #(val, next_seed) = gen(seed)
            do_list_of(gen, remaining - 1, next_seed, [val, ..acc])
        }
    }
}

The Check Function

i
// Run a property against `trials` generated values starting from `seed`.
// Returns Ok(Nil) if all trials pass, or Error(failing_value) on the first failure.
pub fn check(
    gen: Gen(a),
    prop: fn(a) -> Bool,
    trials: Int,
    seed: Seed,
) -> Result(Nil, a) {
    do_check(gen, prop, trials, seed)
}

fn do_check(
    gen: Gen(a),
    prop: fn(a) -> Bool,
    remaining: Int,
    seed: Seed,
) -> Result(Nil, a) {
    case remaining {
        0 -> Ok(Nil)
        _ -> {
            let #(val, next_seed) = gen(seed)
            case prop(val) {
                True -> do_check(gen, prop, remaining - 1, next_seed)
                False -> Error(val)
            }
        }
    }
}

Running the Example

i
    let seed = Seed(42)
    let int_gen = int_between(-50, 50)
    let list_gen = list_of(int_between(-100, 100), 8)

    let reverse_twice =
        check(list_gen, fn(xs) { list.reverse(list.reverse(xs)) == xs }, 200, seed)
    io.println("reverse twice: " <> string.inspect(reverse_twice))

    let sort_idempotent = check(
        list_gen,
        fn(xs) {
            let sorted = list.sort(xs, int.compare)
            list.sort(sorted, int.compare) == sorted
        },
        200,
        seed,
    )
    io.println("sort idempotent: " <> string.inspect(sort_idempotent))

    // This property is false: not every integer is non-negative.
    let always_positive = check(int_gen, fn(n) { n >= 0 }, 200, seed)
    io.println(
        "all non-negative (should fail): " <> string.inspect(always_positive),
    )

Testing

i
pub fn next_changes_seed_test() {
    let #(_, s2) = next(Seed(1))
    let #(_, s3) = next(s2)
    s2 |> should.not_equal(s3)
}

pub fn int_between_in_range_test() {
    let gen = int_between(1, 10)
    let #(val, _) = gen(Seed(99))
    { val >= 1 && val <= 10 } |> should.be_true()
}

pub fn int_between_deterministic_test() {
    let gen = int_between(0, 100)
    let #(a, _) = gen(Seed(7))
    let #(b, _) = gen(Seed(7))
    a |> should.equal(b)
}

pub fn list_of_length_test() {
    let gen = list_of(int_between(0, 10), 5)
    let #(result, _) = gen(Seed(1))
    list.length(result) |> should.equal(5)
}

pub fn reverse_twice_is_identity_test() {
    let gen = list_of(int_between(-100, 100), 10)
    check(gen, fn(xs) { list.reverse(list.reverse(xs)) == xs }, 100, Seed(42))
    |> should.equal(Ok(Nil))
}

pub fn sort_idempotent_test() {
    let gen = list_of(int_between(-100, 100), 10)
    check(
        gen,
        fn(xs) {
            let sorted = list.sort(xs, int.compare)
            list.sort(sorted, int.compare) == sorted
        },
        100,
        Seed(42),
    )
    |> should.equal(Ok(Nil))
}

pub fn check_finds_counterexample_test() {
    // Claiming every integer in [0..10] is > 5 is false.
    let gen = int_between(0, 10)
    check(gen, fn(n) { n > 5 }, 200, Seed(1))
    |> should.be_error()
}

pub fn check_passes_when_true_test() {
    let gen = int_between(1, 100)
    check(gen, fn(n) { n > 0 }, 200, Seed(42))
    |> should.equal(Ok(Nil))
}

Check Understanding

Why does check return Error(val) where val is the failing input, rather than just Error("property failed")?

Knowing which value triggered the failure is the whole point of property testing. A bare error message tells you the test failed; the failing value tells you what to investigate. In a real library, you would then pass that value to a shrinker to find the minimal failing case. Returning Error(val) also keeps the return type polymorphic in a, meaning check works for any generator without needing a separate error type.

The int_between(0, 10) generator uses raw % range to stay in bounds. What goes wrong if lo > hi?

range = hi - lo + 1 becomes zero or negative. Division by zero in Erlang raises a badarith exception, crashing the process. A production generator would validate that lo <= hi and return Error or panic with a useful message. For this lesson, the contract is that lo <= hi; violating it is a programming error, not a runtime event to handle.

Exercises

String generator (15 minutes)

Write string_of(chars: List(String), len: Int) -> Gen(String) that generates a string of exactly len characters chosen randomly from chars. Use list_of and int_between(0, list.length(chars) - 1) as the character picker. Write a property test confirming that every generated string has the right length.

Roundtrip property (15 minutes)

Using int_between, write a property test for the encode/decode roundtrip from the binary lesson: for any integer in the range that the packer supports, decode(encode(n)) == n. Run 500 trials.

Shrinking (20 minutes)

Add a shrink_int(n: Int) -> List(Int) function that returns a list of smaller candidate values to try: [0, n / 2, n - 1] (filtered to remove duplicates and values equal to n). Modify do_check to call shrink_int on the first failing value and keep shrinking as long as smaller values also fail, returning the smallest failing value found. Test that a property like n < 3 produces a counterexample of exactly 3.

Pair generator (10 minutes)

Write pair_of(gen_a: Gen(a), gen_b: Gen(b)) -> Gen(#(a, b)) that generates a pair by running both generators in sequence, threading the seed through. Use it to write a property test confirming that dict.merge(a, b) always contains at least as many keys as dict.size(a).