Log-Structured Key-Value Store

The Log-Structured Idea

The Entry Type

i
pub type Entry {
  Set(key: String, value: String)
  Delete(key: String)
}

Reading from the Log

i
pub fn get(log: List(Entry), key: String) -> Option(String) {
  get_scan(log, key)
}

fn get_scan(log: List(Entry), key: String) -> Option(String) {
  case log {
    [] -> None
    [Set(k, v), ..] if k == key -> Some(v)
    [Delete(k), ..] if k == key -> None
    [_, ..rest] -> get_scan(rest, key)
  }
}

Indexed Lookup

i
pub fn build_index(log: List(Entry)) -> dict.Dict(String, String) {
  list.fold(list.reverse(log), dict.new(), fn(acc, entry) {
    case entry {
      Set(k, v) -> dict.insert(acc, k, v)
      Delete(k) -> dict.delete(acc, k)
    }
  })
}

pub fn get_indexed(
  index: dict.Dict(String, String),
  key: String,
) -> Option(String) {
  case dict.get(index, key) {
    Ok(v) -> Some(v)
    Error(_) -> None
  }
}

Listing Live Keys

The keys function returns all keys that currently have a value:

i
pub fn keys(log: List(Entry)) -> List(String) {
  log
  |> list.fold(dict.new(), fn(acc, entry) {
    case entry {
      Set(k, _) -> dict.insert(acc, k, True)
      Delete(k) -> dict.delete(acc, k)
    }
  })
  |> dict.keys
}

Compaction

i
pub fn compact(log: List(Entry)) -> List(Entry) {
  let live_dict =
    list.fold(list.reverse(log), dict.new(), fn(acc, entry) {
      case entry {
        Set(k, v) -> dict.insert(acc, k, v)
        Delete(k) -> dict.delete(acc, k)
      }
    })

  dict.to_list(live_dict)
  |> list.map(fn(item) { let #(k, v) = item  Set(k, v) })
}

Serialisation

i
fn compact_to_lines(log: List(Entry)) -> List(String) {
  log
  |> list.map(fn(entry) {
    case entry {
      Set(k, v) -> "SET|" <> k <> "|" <> v
      Delete(k) -> "DEL|" <> k
    }
  })
}

fn parse_lines(lines: List(String)) -> List(Entry) {
  list.map(lines, fn(line) {
    case string.split(line, "|") {
      ["SET", k, v] -> Set(k, v)
      ["DEL", k] -> Delete(k)
      _ -> Set("error", "bad line")
    }
  })
}

Testing

i
pub fn set_and_get_test() {
  let log = [Set("x", "1")]
  get(log, "x")
  |> should.equal(Some("1"))
}

pub fn overwrite_test() {
  // newest entry is prepended; Set("x","2") is the most recent write
  let log = [Set("x", "2"), Set("x", "1")]
  get(log, "x")
  |> should.equal(Some("2"))
}

pub fn delete_test() {
  // Delete is prepended after Set, so it is the most recent entry
  let log = [Delete("x"), Set("x", "1")]
  get(log, "x")
  |> should.equal(None)
}

Check Understanding

A log contains ten entries, all setting key "x" to different values. After calling compact, how many entries does the result contain for "x", and why?

One entry. compact folds the reversed log into a Dict, so each key appears at most once. The fold processes entries from oldest to newest; dict.insert overwrites on each call, so the newest value wins. All earlier entries for "x" are shadowed and discarded.

The log [Delete("x"), Set("x", "1")] is newest-first. What does get(log, "x") return? What would it return if the list were reversed to [Set("x", "1"), Delete("x")], and what real-world operation sequence does each order represent?

For [Delete("x"), Set("x", "1")], get returns None: get_scan matches Delete("x") at the head and stops immediately. This represents: key "x" was set first, then deleted.

For [Set("x", "1"), Delete("x")], get returns Some("1"): get_scan matches Set("x", "1") at the head and stops. This represents: key "x" was deleted first, then set again.

The ordering of the list encodes the direction of time.

Exercises

Round-trip serialisation (15 minutes)

Implement to_lines(log: List(Entry)) -> List(String) and from_lines(lines: List(String)) -> List(Entry). Write tests confirming that log |> to_lines |> from_lines equals the original log for a mix of Set and Delete entries.

Indexed get (15 minutes)

Add a function that builds a Dict(String, String) index from the log and uses it for O(1) lookups. Compare the interface to the scan-based get: both return Option(String), so callers do not need to change.

Write-ahead log (20 minutes)

Extend the demo to write each new Entry to a text file using simplifile. Add a load(path: String) -> Result(List(Entry), String) function that reads and parses the file on startup. Write a test that writes a few entries, loads them back, and confirms the result.