Binary Data Packer
BitArrayrepresents sequences of bits and bytes<<n:32-big, rest:bits>>in a pattern destructures aBitArrayinto a 32-bit big-endian integer and the remaining bytes- A format descriptor type (
Fmt) mirrors the value type (Value) but carries no data, separating schema from content - Encoding and decoding recurse over paired lists of formats and values
Result(List(Value), String)propagates decode errors cleanly
The Value and Format Types
- Text representations like JSON are easy to read but wasteful
- The integer
1000000takes seven bytes as ASCII but only four bytes as a 32-bit integer - And good luck representing a movie as characters
- The integer
- Two types work together:
pub type Value {
VInt(Int)
VStr(String)
}
pub type Fmt {
FInt
FStr
}
Valuerepresents a single value that can be packed: either an integer or a stringFmtis the schema descriptor: it says what kind of value to expect at each positionFIntandFStrcarry no payload: they are pure tags- Keeping them separate means the same format list can pack different value lists without mixing concerns
- The closest equivalent in Python is the
structformat string">I7s"- But that is an untyped string with no compiler help
- Here the types enforce that every
FIntin the format list lines up with aVIntin the value list
Packing Values
packencodes a list of values guided by a format list:
pub fn pack(formats: List(Fmt), values: List(Value)) -> BitArray {
case formats, values {
[], [] -> <<>>
[FInt, ..frest], [VInt(n), ..vrest] ->
<<n:32-big, pack(frest, vrest):bits>>
[FStr, ..frest], [VStr(s), ..vrest] -> {
let bytes = string_to_bytes(s)
let len = bit_array.byte_size(bytes)
<<len:32-big, bytes:bits, pack(frest, vrest):bits>>
}
_, _ -> <<>>
}
}
<<>>is an emptyBitArray<<n:32-big, pack(frest, vrest):bits>>concatenates a 32-bit big-endian integer with the recursive result:bitssplices aBitArrayinto a larger one
- For strings,
string_to_bytesconverts theStringto UTF-8 bytesbit_array.byte_sizemeasures it- Then the length is written as a 32-bit prefix before the bytes
- The final
_, _ -> <<>>handles mismatched lists- In production code this would return a
Result
- In production code this would return a
- The
BitArrayliteral syntax mirrors pattern-matching syntax - The same annotations (
32-big,utf8,bits) appear on both sides - This symmetry makes encoding and decoding code look parallel
Unpacking Values
- Decoding reverses the process by consuming bytes guided by the same format list:
fn unpack_loop(
formats: List(Fmt),
data: BitArray,
acc: List(Value),
) -> Result(List(Value), String) {
case formats, data {
[], _ -> Ok(list.reverse(acc))
[FInt, ..frest], <<n:32-big, rest:bits>> ->
unpack_loop(frest, rest, [VInt(n), ..acc])
[FStr, ..frest], <<len:32-big, str_data:bytes-size(len), rest:bits>> -> {
let s = str_data |> bytes_to_string
unpack_loop(frest, rest, [VStr(s), ..acc])
}
_, _ -> Error("unexpected end of data")
}
}
<<n:32-big, rest:bits>>binds the first four bytes as an integer andrestas the remaining bytes<<len:32-big, str_data:bytes-size(len), rest:bits>>reads the length prefix, then reads exactlylenbytes intostr_databit_array.to_string(str_data)converts those bytes back to aStringand returnsOk(s)orError(Nil)if the bytes are not valid UTF-8- Values are accumulated in reverse and reversed at the end
Error("unexpected end of data")fires when the pattern match fails, meaning theBitArrayran out of bytes before the format list did- The
bytes-size(len)annotation is the key to variable-length fields- It reads exactly
lenbytes, wherelenwas bound by the precedinglen:32-big
- It reads exactly
Handling Corrupt Data
- Decoding real-world data requires dealing with truncation and corruption
pub fn main() {
let formats = [FStr, FInt]
let values = [VStr("Ada"), VInt(30)]
let packed = pack(formats, values)
io.println("packed byte size: " <> int.to_string(bit_array.byte_size(packed)))
let unpacked = unpack(formats, packed)
io.println(string.inspect(unpacked))
let corrupted = <<packed:bits, 255>>
let failed = unpack(formats, corrupted)
io.println(string.inspect(failed))
}
<<packed:bits, 255>>appends an extra byte to simulate a corrupted or over-long messageunpackis called with the same format list and the corrupted data- Because the format list is exhausted first (all values decoded),
the extra byte is silently ignored
Ok(list.reverse(acc))fires when the format list is empty, regardless of remaining bytes
- To reject trailing bytes, change the base case to
[], <<>> -> Ok(list.reverse(acc))and[], _ -> Error("trailing bytes") - The debug output shows that the round-trip succeeds
- I.e., the extra byte does not corrupt the result
BitArray Literal Syntax Reference
These annotations appear identically in construction (<<n:32-big>>)
and in pattern matching (<<n:32-big, rest:bits>>).
| Annotation | Meaning |
|---|---|
n:8 |
8-bit unsigned integer |
n:16-big |
16-bit big-endian integer |
n:32-big |
32-bit big-endian integer |
n:64-big |
64-bit big-endian integer |
s:utf8 |
UTF-8 encoded string |
data:bytes |
a BitArray as raw bytes |
data:bits |
a BitArray as raw bits |
data:bytes-size(len) |
exactly len bytes (len must be bound earlier) |
Python's struct.pack(">I", 42) and struct.unpack(">I", data) do the same job
but the format string is parsed at runtime.
A typo in ">II" (two integers) is a runtime error, not a compile error.
Gleam checks the BitArray annotations at compile time.
The format descriptor list (List(Fmt)) is also statically typed:
[FInt, FStr, FInt] will only accept [VInt(...), VStr(...), VInt(...)].
A mismatched list falls through to the _, _ -> <<>> fallback,
which could be made into a Result error.
The tradeoff is that Python's struct handles dozens of format characters
(floats, signed integers, padding) out of the box.
Gleam's BitArray handles arbitrary bit widths and endianness natively,
but floating-point packing requires either bit manipulation or an extra library.
Testing
pub fn roundtrip_int_test() {
let formats = [FInt]
let values = [VInt(42)]
pack(formats, values)
|> unpack(formats, _)
|> should.equal(Ok([VInt(42)]))
}
pub fn roundtrip_string_test() {
let formats = [FStr]
let values = [VStr("Gleam")]
pack(formats, values)
|> unpack(formats, _)
|> should.equal(Ok([VStr("Gleam")]))
}
pub fn roundtrip_mixed_test() {
let formats = [FInt, FStr, FInt]
let values = [VInt(1), VStr("hello"), VInt(2)]
pack(formats, values)
|> unpack(formats, _)
|> should.equal(Ok(values))
}
- Each test packs a value and immediately unpacks it, checking that the decoded list equals the original
truncated_data_testpasses only one byte for an integer field- The pattern
<<n:32-big, rest:bits>>cannot match, soError("unexpected end of data")is returned
- The pattern
Check Understanding
What is big-endian and why does it matter?
Big-endian stores the most significant byte first.
The number 1 as a 32-bit big-endian integer is 00 00 00 01.
Little-endian (used by x86 processors) reverses this: 01 00 00 00.
Network protocols like TCP/IP use big-endian,
sometimes called "network byte order".
When two systems with different native endianness communicate,
using an explicit annotation (32-big or 32-little)
ensures both sides agree on the byte order.
Exercises
Name and age record (15 minutes)
Pack a record containing a name (String) and an age (Int).
Unpack it and confirm with should.equal.
Then intentionally truncate the packed bytes to one fewer byte
and confirm unpack returns an Error.
Reject trailing bytes (10 minutes)
Modify unpack_loop so that leftover bytes after all format fields have been consumed
produce Error("trailing bytes").
Add a test that packs [VInt(1)] and then appends an extra byte,
confirming the error is returned.
Add a float type (20 minutes)
Add VFloat(Float) to Value and FFloat to Fmt.
Gleam floats are 64-bit IEEE 754.
The annotation for a 64-bit float in a BitArray is f:float.
Update pack and unpack_loop to handle the new variant and write two tests.
Nested record (20 minutes)
Design a format that packs a list of records,
where the list itself is length-prefixed.
Add FList(List(Fmt)) to Fmt and handle it in pack and unpack_loop.
A packed list starts with a 32-bit count
followed by that many repetitions of the inner format.