A Database
- Build a log-structured key-value store with pure functional operations
- Append new values and tombstones (deletion markers) to an immutable log
- Query with newest-first scanning and compact with deduplication
- Persist to disk using
IO.FS.Handlewith a simple line-based serialization format
- Databases are the backbone of most applications
- We'll build a minimal key-value store from first principles
- Pure core that can be tested without touching disk (like archive)
- Builds on types from types, IO from io, and pipelines from pipes
Log Entries
i
-- A log entry: key paired with an optional value (none = deleted/tombstone)
abbrev LogEntry := String × Option String
abbrev Log := List LogEntry
- A log entry is a
(key, value)pair where the value is optionalsome "val": the key has this valuenone: the key has been deleted (a tombstone)
Logis just a list of entries, newest at the endabbrevcreates transparent type aliasesLogEntryandString × Option Stringare interchangeable
- Like Git's commit history: each operation appends a new record
- Nothing is ever truly deleted until you compact
Setting Values
i
-- Append a new value record to the log
def dbSet (log : Log) (key val : String) : Log :=
log ++ [(key, some val)]
dbSetappends a new(key, some val)entry to the log- Does not modify or remove old entries — just adds at the end
- The log only grows; old values are shadowed, not overwritten
- Like Python's
log.append((key, val))but returning a new list instead of mutating
Deleting Values
i
-- Append a tombstone to mark a key as deleted
def dbDel (log : Log) (key : String) : Log :=
log ++ [(key, none)]
dbDelappends a tombstone:(key, none)- The key still appears in the log, but its value is now
none - When queried, the tombstone indicates "this key was deleted"
- Like setting a
deleted_attimestamp in a soft-delete pattern
Getting Values
i
-- Scan newest-first: return the first value found for the key
def dbGet (log : Log) (key : String) : Option String :=
Option.join ((List.reverse log).findSome? (β := Option String) fun (k, v) =>
if k == key then some v else none)
dbGetscans the log newest-first (reversed) and returns the first matchList.reverse logputs newest entries firstfindSome?returns the firstsomeresult from the scan function- The function returns
some vwhen the key matches,noneotherwise
- The function returns
β := Option Stringis an explicit type annotation: Lean cannot infer the inner type offindSome?here without a hintOption.joinflattensOption (Option String)toOption Stringsome (some "val")→some "val"(key found with value)some none→none(tombstone found)none→none(key not in log)
- Like Python's
next((v for k, v in reversed(log) if k == key), None)but handling tombstones
Compaction
i
-- Compact the log: keep only the most recent entry for each key
def dbCompact (log : Log) : Log :=
let (_, kept) := (List.reverse log).foldl (init := ([], [])) fun (seen, acc) (k, v) =>
if seen.contains k then (seen, acc)
else (k :: seen, (k, v) :: acc)
kept
dbCompactremoves redundant entries, keeping only the newest per key- Algorithm: scan reversed log, keep first occurrence of each key
seentracks keys already encounteredkeptaccumulates deduplicated entries
- Scanning newest-first and prepending each new entry to
keptrestores original key order- The first new key encountered (newest "b") is prepended first, then newest "a" is prepended on top of it
- Result: "a" before "b" — same order as the original log
- Same trick as reversing a list with
foldland prepend
- Like Python's
dict(reversed(log))but preserving insertion order of last write - Performance:
dbCompacttraverses the log twice — once to reverse it, once to fold over it- Both passes are O(n); the total cost is O(n) overall
- For large logs this is fine; for hot paths you could compact incrementally instead
Checking Key Existence
i
-- Check if a key exists in the log (has a non-tombstone entry)
def dbExists (log : Log) (key : String) : Bool :=
dbGet log key |>.isSome
dbExistschecks if a key has a non-tombstone valuedbGet log key |>.isSomeistruewhen the value issome _- Returns
falsefor both missing keys and tombstoned keys
Serialization
i
-- Serialize one entry as "key\tvalue\n" or "key\t\n" for tombstone
def serializeEntry (k : String) (v : Option String) : String :=
s!"{k}\t{v.getD ""}\n"
-- Deserialize a "key\tvalue" line back into a LogEntry
def deserializeEntry (line : String) : Option LogEntry :=
match line.splitOn "\t" with
| [k, v] => some (k, if v.isEmpty then none else some v)
| _ => none
serializeEntryformats an entry as"key\tvalue\n"- For tombstones, value is empty:
"key\t\n" v.getD ""extracts the value string, defaulting to""fornone
- For tombstones, value is empty:
deserializeEntryparses a line back into aLogEntryline.splitOn "\t"splits by the tab character- Returns
nonefor malformed lines (wrong number of fields)
- Like CSV but with tabs as delimiters and no quoting
File-Backed Operations
i
-- Append one entry to a file
def appendEntry (path key : String) (val : Option String) : IO Unit := do
let h ← IO.FS.Handle.mk path IO.FS.Mode.append
h.putStr (serializeEntry key val)
h.flush
-- Read entire log from a file
def readLog (path : String) : IO Log := do
let text ← IO.FS.readFile path
pure (text.splitOn "\n" |>.filterMap deserializeEntry)
appendEntryopens a file in append mode and writes a serialized entryIO.FS.Handle.mk path IO.FS.Mode.appendopens a file for appendingh.putStrwrites the string,h.flushensures it's on disk
readLogreads the entire file and parses each lineIO.FS.readFile pathreturns the file contents as a stringsplitOn "\n"breaks into linesfilterMap deserializeEntryparses each line, discarding malformed ones
- The file format is append-only: each new entry goes at the end
- You never rewrite the file in place
- Compaction to a new file is the only time you write a complete log
Tests
i
-- Tests ---------------------------------------------------------------
-- Set and get
#guard dbGet (dbSet [] "x" "1") "x" == some "1"
-- Latest write wins
#guard dbGet (dbSet (dbSet [] "x" "1") "x" "2") "x" == some "2"
-- Tombstone
#guard dbGet (dbDel (dbSet [] "x" "1") "x") "x" == none
-- Compact reduces redundant entries
def messyLog : Log := [("a", some "1"), ("b", some "1"), ("a", some "2"), ("b", none)]
#guard dbCompact messyLog == [("a", some "2"), ("b", none)]
-- Exists check
#guard dbExists (dbSet [] "x" "1") "x"
#guard !dbExists (dbDel (dbSet [] "x" "1") "x") "x"
#guard !dbExists [] "x"
#guardtests verify the pure operations- Set and get: writing a value and reading it back
- Latest write wins: writing the same key twice returns the newer value
- Tombstone: deleting a key returns
none - Compaction: four entries reduce to two after deduplication
"a"has two writes — only the latest ("2") is kept"b"has a write then a tombstone — only the tombstone is kept
- Existence: checks that
dbExistsreturns correct boolean values
Running the Program
i
-- Entry point: demonstrate the database in memory and on disk
-- Run with: lean --run db/code.lean
def main : IO Unit := do
-- In-memory
let log := dbSet (dbSet (dbSet [] "name" "Lean") "lang" "functional") "name" "Lean 4"
IO.println s!"In-memory log: {log}"
IO.println s!" dbGet log \"name\" = {dbGet log "name"}"
IO.println s!" dbGet log \"lang\" = {dbGet log "lang"}"
IO.println s!" Compacted: {dbCompact log}"
-- File-backed: write to a log file, then read it back
let path := "db/demo.log"
IO.println s!"\nWriting to {path}..."
appendEntry path "hello" (some "world")
appendEntry path "count" (some "1")
appendEntry path "count" (some "2") -- latest write wins
appendEntry path "tmp" none -- tombstone
let diskLog ← readLog path
IO.println s!" Read from disk: {diskLog}"
IO.println s!" dbGet \"hello\" = {dbGet diskLog "hello"}"
IO.println s!" dbGet \"count\" = {dbGet diskLog "count"}"
IO.println s!" dbGet \"tmp\" = {dbGet diskLog "tmp"}"
IO.println "Done."
maindemonstrates both in-memory and file-backed operations- In-memory: creates a log, queries it, compacts it
- File-backed: writes entries to a file, reads them back, queries
IO.FS.removeFileis not available in the prelude, so we skip cleanup
- Compile and type-check with:
lake env lean db/code.lean
- Run interactively with:
lean --run db/code.lean
- The file-backed section creates a
db/demo.logfile- You can inspect it with
cat db/demo.log
- You can inspect it with
Exercises
Fix: Wrong Scan Direction
i
-- This scans front-to-back; should scan newest-first for latest-write-wins
abbrev LogEntry := String × Option String
abbrev Log := List LogEntry
def dbGet (log : Log) (key : String) : Option String :=
Option.join (log.findSome? (β := Option String) fun (k, v) =>
if k == key then some v else none)
#guard dbGet [("x", some "1"), ("x", some "2")] "x" == some "2"
#guard dbGet [("x", none), ("x", some "1")] "x" == none
hint
- The function scans the log front-to-back, returning the oldest value
- Reverse the log before scanning so the newest entry is found first
- Use
List.reverse loginstead oflog
Fix: Compaction Loses Latest Value
i
-- This keeps the oldest value instead of the newest. Fix it.
abbrev LogEntry := String × Option String
abbrev Log := List LogEntry
def dbCompact (log : Log) : Log :=
let (_, kept) := log.foldl (init := ([], [])) fun (seen, acc) (k, v) =>
if seen.contains k then (seen, acc)
else (k :: seen, (k, v) :: acc)
kept
#guard dbCompact [("a", some "1"), ("b", some "1"), ("a", some "2"), ("b", none)]
== [("a", some "2"), ("b", none)]
hint
- Scanning unreversed log keeps the first occurrence of each key (the oldest)
- Reverse the log before folding so the newest entry is kept first
- The final
keptwill already be in original order after the double-reversal
Fix: Serialization Drops Tombstones
i
-- Tombstones serialize as "DELETED", but should serialize as empty string
def serializeEntry (k : String) (v : Option String) : String :=
s!"{k}\t{v.getD "DELETED"}\n"
def deserializeEntry (line : String) : Option (String × Option String) :=
match line.splitOn "\t" with
| [k, v] => some (k, if v.isEmpty then none else some v)
| _ => none
#guard deserializeEntry (serializeEntry "x" none) == some ("x", none)
#guard deserializeEntry (serializeEntry "x" (some "y")) == some ("x", some "y")
hint
v.getD "DELETED"writes"DELETED"for tombstones instead of""- Tombstones should serialize as empty values so they're distinguishable from valid data
- Use
v.getD ""for an empty tombstone value
Fix: Deserialization Ignores Empty Values
i
-- deserializeEntry always wraps the value in some; should detect tombstones
def deserializeEntry (line : String) : Option (String × Option String) :=
match line.splitOn "\t" with
| [k, v] => some (k, some v)
| _ => none
#guard deserializeEntry "key\tvalue" == some ("key", some "value")
#guard deserializeEntry "key\t" == some ("key", none)
#guard deserializeEntry "bad" == none
hint
- The code always wraps the value in
some, even for tombstones - Check
if v.isEmpty then none else some vto detect tombstones - Empty value after the tab means "deleted"
Fix: Missing do in Append
i
-- Missing do block: two IO actions need sequencing
def appendEntry (path key : String) (val : Option String) : IO Unit :=
let h ← IO.FS.Handle.mk path IO.FS.Mode.append
h.putStr s!"{key}\t{val.getD ""}\n"
h.flush
#eval appendEntry "db/test.log" "hello" (some "world")
hint
- The function body has two IO actions but no
doblock - Add
doafter:=to sequence the handle creation, writing, and flushing
Fix: Tombstone Not Matched
i
-- Tombstones (none values) are not detected; Option.join is missing
abbrev LogEntry := String × Option String
abbrev Log := List LogEntry
def dbGet (log : Log) (key : String) : Option String :=
(List.reverse log).findSome? (β := Option String) fun (k, v) =>
if k == key then some v else none
#guard dbGet [("x", none), ("x", some "1")] "x" == none
hint
findSome?returns the firstsomeresult, but the lambda returnssome vonly for matching keys- When a tombstone matches,
some noneis returned butOption.joinis missing - Add
Option.jointo flatten the result and properly detect tombstones
Write: Count Entries for a Key
i
-- Write a function that counts how many times a key appears in the log
abbrev LogEntry := String × Option String
abbrev Log := List LogEntry
def countEntries (log : Log) (key : String) : Nat :=
0
#guard countEntries [("x", some "1"), ("x", some "2"), ("y", some "1")] "x" == 2
#guard countEntries [("x", none), ("x", some "1")] "x" == 2
#guard countEntries [] "x" == 0
hint
- Count how many times a key appears in the log (including tombstones)
- Use
List.foldlwith a counter, orList.filterthenList.length countEntries log "x"should return the total number of(k, v)pairs wherek == "x"
Write: List All Unique Keys
i
-- Write a function that returns all unique keys in the log (deduplicated)
abbrev LogEntry := String × Option String
abbrev Log := List LogEntry
def keys (log : Log) : List String :=
[]
#guard keys [("a", some "1"), ("b", some "2"), ("a", some "3")] == ["a", "b"]
#guard keys [] == []
hint
- Return a deduplicated list of all keys in the log
- Use
List.mapto extract keys, then deduplicate keysfunction: keep the first occurrence on reversed log (same pattern asdbCompact)
Write: Merge Two Logs
i
-- Write a function that merges two logs (second log's entries are newer)
abbrev LogEntry := String × Option String
abbrev Log := List LogEntry
def dbMerge (log1 log2 : Log) : Log :=
[]
#guard dbMerge [("x", some "1")] [("x", some "2")] == [("x", some "1"), ("x", some "2")]
#guard dbMerge [] [("a", some "1")] == [("a", some "1")]
hint
- Append one log to the other:
log1 ++ log2 - Entries from
log2are newer and will shadowlog1entries when queried - This is the simplest operation: just concatenate
Write: Batch Update
i
-- Write a function that applies a batch of (key, value) updates to a log
abbrev LogEntry := String × Option String
abbrev Log := List LogEntry
def dbSet (log : Log) (key val : String) : Log :=
log ++ [(key, some val)]
def batchSet (log : Log) (updates : List (String × String)) : Log :=
log
#guard batchSet [] [("x", "1"), ("y", "2")] == [("x", some "1"), ("y", some "2")]
#guard batchSet [("x", some "0")] [("x", "1")] == [("x", some "0"), ("x", some "1")]
hint
- Apply a list of
(key, value)pairs to a log - Use
List.foldlto iterate over the updates and calldbSetfor each batchSet log [("x", "1"), ("y", "2")]should add both entries
Write: History for a Key
i
-- Write a function that returns the full history of values for a key
abbrev LogEntry := String × Option String
abbrev Log := List LogEntry
def dbHistory (log : Log) (key : String) : List (Option String) :=
[]
#guard dbHistory [("x", some "1"), ("x", none), ("x", some "2")] "x"
== [some "1", none, some "2"]
#guard dbHistory [("y", some "1")] "x" == []
hint
- Return all values for a key in chronological order (oldest first)
- Filter the log to entries for the key, then extract values
- Include
nonevalues (tombstones) in the history - Use
List.filterMapto both filter and extract in one pass
Write: Format Log as String
i
-- Write a function that formats the entire log as a readable string
abbrev LogEntry := String × Option String
abbrev Log := List LogEntry
def dbDump (log : Log) : String :=
""
#guard dbDump [("x", some "1"), ("y", none)]
== "x: 1\ny: (deleted)"
#guard dbDump [] == ""
hint
- Convert the entire log to a human-readable string
- Use
List.mapto format each entry:s!"{k}: {v.getD "(deleted)"}" - Join with
String.intercalate "\n" - Like a simple database dump command