Binary Data Packer

The Value and Format Types

i
pub type Value {
  VInt(Int)
  VStr(String)
}

pub type Fmt {
  FInt
  FStr
}

Packing Values

i
pub fn pack(formats: List(Fmt), values: List(Value)) -> BitArray {
  case formats, values {
    [], [] -> <<>>
    [FInt, ..frest], [VInt(n), ..vrest] ->
      <<n:32-big, pack(frest, vrest):bits>>
    [FStr, ..frest], [VStr(s), ..vrest] -> {
      let bytes = string_to_bytes(s)
      let len = bit_array.byte_size(bytes)
      <<len:32-big, bytes:bits, pack(frest, vrest):bits>>
    }
    _, _ -> <<>>
  }
}

Unpacking Values

i
fn unpack_loop(
  formats: List(Fmt),
  data: BitArray,
  acc: List(Value),
) -> Result(List(Value), String) {
  case formats, data {
    [], _ -> Ok(list.reverse(acc))
    [FInt, ..frest], <<n:32-big, rest:bits>> ->
      unpack_loop(frest, rest, [VInt(n), ..acc])
    [FStr, ..frest], <<len:32-big, str_data:bytes-size(len), rest:bits>> -> {
      let s = str_data |> bytes_to_string
      unpack_loop(frest, rest, [VStr(s), ..acc])
    }
    _, _ -> Error("unexpected end of data")
  }
}

Handling Corrupt Data

i
pub fn main() {
  let formats = [FStr, FInt]
  let values = [VStr("Ada"), VInt(30)]

  let packed = pack(formats, values)
  io.println("packed byte size: " <> int.to_string(bit_array.byte_size(packed)))

  let unpacked = unpack(formats, packed)
  io.println(string.inspect(unpacked))

  let corrupted = <<packed:bits, 255>>
  let failed = unpack(formats, corrupted)
  io.println(string.inspect(failed))
}

BitArray Literal Syntax Reference

These annotations appear identically in construction (<<n:32-big>>) and in pattern matching (<<n:32-big, rest:bits>>).

Annotation Meaning
n:8 8-bit unsigned integer
n:16-big 16-bit big-endian integer
n:32-big 32-bit big-endian integer
n:64-big 64-bit big-endian integer
s:utf8 UTF-8 encoded string
data:bytes a BitArray as raw bytes
data:bits a BitArray as raw bits
data:bytes-size(len) exactly len bytes (len must be bound earlier)

Python's struct.pack(">I", 42) and struct.unpack(">I", data) do the same job but the format string is parsed at runtime. A typo in ">II" (two integers) is a runtime error, not a compile error.

Gleam checks the BitArray annotations at compile time. The format descriptor list (List(Fmt)) is also statically typed: [FInt, FStr, FInt] will only accept [VInt(...), VStr(...), VInt(...)]. A mismatched list falls through to the _, _ -> <<>> fallback, which could be made into a Result error.

The tradeoff is that Python's struct handles dozens of format characters (floats, signed integers, padding) out of the box. Gleam's BitArray handles arbitrary bit widths and endianness natively, but floating-point packing requires either bit manipulation or an extra library.

Testing

i
pub fn roundtrip_int_test() {
  let formats = [FInt]
  let values = [VInt(42)]
  pack(formats, values)
  |> unpack(formats, _)
  |> should.equal(Ok([VInt(42)]))
}

pub fn roundtrip_string_test() {
  let formats = [FStr]
  let values = [VStr("Gleam")]
  pack(formats, values)
  |> unpack(formats, _)
  |> should.equal(Ok([VStr("Gleam")]))
}

pub fn roundtrip_mixed_test() {
  let formats = [FInt, FStr, FInt]
  let values = [VInt(1), VStr("hello"), VInt(2)]
  pack(formats, values)
  |> unpack(formats, _)
  |> should.equal(Ok(values))
}

Check Understanding

What is big-endian and why does it matter?

Big-endian stores the most significant byte first. The number 1 as a 32-bit big-endian integer is 00 00 00 01. Little-endian (used by x86 processors) reverses this: 01 00 00 00. Network protocols like TCP/IP use big-endian, sometimes called "network byte order". When two systems with different native endianness communicate, using an explicit annotation (32-big or 32-little) ensures both sides agree on the byte order.

Exercises

Name and age record (15 minutes)

Pack a record containing a name (String) and an age (Int). Unpack it and confirm with should.equal. Then intentionally truncate the packed bytes to one fewer byte and confirm unpack returns an Error.

Reject trailing bytes (10 minutes)

Modify unpack_loop so that leftover bytes after all format fields have been consumed produce Error("trailing bytes"). Add a test that packs [VInt(1)] and then appends an extra byte, confirming the error is returned.

Add a float type (20 minutes)

Add VFloat(Float) to Value and FFloat to Fmt. Gleam floats are 64-bit IEEE 754. The annotation for a 64-bit float in a BitArray is f:float. Update pack and unpack_loop to handle the new variant and write two tests.

Nested record (20 minutes)

Design a format that packs a list of records, where the list itself is length-prefixed. Add FList(List(Fmt)) to Fmt and handle it in pack and unpack_loop. A packed list starts with a 32-bit count followed by that many repetitions of the inner format.