Glob Pattern Matcher

The Problem

Representing Pattern Elements

i
pub type Elem {
  Literal(String)
  AnyChar
  Wildcard
}

Matching a Pattern Against Text

i
pub fn match_pattern(pattern: List(Elem), text: String) -> Bool {
  match_pattern_chars(pattern, string_to_chars(text))
}

fn match_pattern_chars(pattern: List(Elem), chars: List(String)) -> Bool {
  case pattern, chars {
    [], [] -> True
    [], _ -> False
    [Literal(c), ..pat_rest], _ ->
      case drop_prefix(string.to_graphemes(c), chars) {
        Ok(remaining) -> match_pattern_chars(pat_rest, remaining)
        Error(_) -> False
      }
    [AnyChar, ..pat_rest], [_, ..char_rest] ->
      match_pattern_chars(pat_rest, char_rest)
    [Wildcard, ..pat_rest], _ -> {
      case match_pattern_chars(pat_rest, chars) {
        True -> True
        False -> {
          case chars {
            [] -> False
            [_, ..char_rest] -> match_pattern_chars(pattern, char_rest)
          }
        }
      }
    }
    _, _ -> False
  }
}

Handling Wildcards

[Wildcard, ..pat_rest], _ -> {
  case match_pattern_chars(pat_rest, chars) {
    True -> True
    False -> {
      case chars {
        [] -> False
        [_, ..char_rest] -> match_pattern_chars(pattern, char_rest)
      }
    }
  }
}

Parsing Patterns

i
fn parse_pattern(s: String) -> List(Elem) {
  let chars = string.to_graphemes(s)
  parse_chars(chars, [])
}

fn parse_chars(chars: List(String), acc: List(Elem)) -> List(Elem) {
  case chars {
    [] -> list.reverse(acc)
    ["*", ..rest] -> parse_chars(rest, [Wildcard, ..acc])
    ["?", ..rest] -> parse_chars(rest, [AnyChar, ..acc])
    _ -> {
      let #(literal, remaining) = take_literal(chars, [])
      parse_chars(remaining, [Literal(literal), ..acc])
    }
  }
}

Running the Examples

i
gleam run --module match_demo
i
[Wildcard, Literal(".gleam")]
[Literal("data"), AnyChar, Literal(".csv")]
[Literal("hello")]

Writing Tests

i
pub fn literal_match_test() {
  match_pattern([Literal("a")], "a")
  |> should.be_true()
}

pub fn anychar_test() {
  match_pattern([AnyChar], "x")
  |> should.be_true()
}

pub fn wildcard_many_test() {
  match_pattern([Wildcard, Literal(".gleam")], "mymodule.gleam")
  |> should.be_true()
}

Filtering a List of Files

Combining the parser and matcher gives a useful utility:

i
pub fn filter_files(
  pattern: String,
  files: List(String),
) -> List(String) {
  let elems = parse_pattern(pattern)
  list.filter(files, match_pattern(elems, _))
}

Check Understanding

Why does wildcard try zero characters first?

If the wildcard tried consuming a character first, a pattern like *x against "x" would fail: the wildcard would consume x, leaving nothing for the literal x to match. Trying zero first means a wildcard only uses up characters when the rest of the pattern cannot match without them.

Why does parse_chars prepend to acc and call list.reverse at the end rather than appending each element?

Gleam lists are singly-linked, so prepending ([elem, ..acc]) is O(1): it just creates a new node pointing at the existing list. Appending to the end, by contrast, requires walking the whole list to find the last node, making each append O(n) and the full parse O(n²). Building the list in reverse with prepend and reversing once at the end keeps the total work O(n).

Exercises

Parse integration (15 minutes)

Wire parse_pattern to filter_files so the function accepts a String pattern rather than a List(Elem). Write gleeunit tests for filter_files that cover: - an empty file list - a pattern that matches no files - a pattern that matches all files - a mixed result

Character sets (20 minutes)

Add a CharSet(List(String)) variant to Elem. A CharSet matches any single character that appears in its list, so CharSet(["a", "e", "i", "o", "u"]) matches any vowel. Update match_pattern_chars to handle the new variant. Write at least four tests.

Negated character sets (15 minutes)

Add NegCharSet(List(String)) that matches any character not in the list. Update the matcher and write tests confirming that it does not match characters in the list but does match characters outside it.

Anchor to start and end (10 minutes)

Shell globs implicitly match anywhere unless anchored. Add Start and End variants that require the match to begin at the first character and end at the last character respectively. Update match_pattern (not just match_pattern_chars) to interpret them.