class: slide-title
Software Design by Example
Matching Patterns
chapter
--- ## The Problem - `glob.glob` finds files whose names match patterns - `2023-*.{pdf,txt}` matches `2023-01.txt` and `2023-final.pdf` but not `draft-2023.docx`
- How does this work? --- class: aside ## Why "Globbing"? - Early versions of Unix had a tool called `glob` - People used pattern matching so often that it was quickly built into the shell --- ## Write a Few Tests | Pattern | Text | Match? || Pattern | Text | Match? | | ------- | -------- | ------ || ------- | -------- | ------ | | abc | "abc" | True || a*c | "abc" | True | | ab | "abc" | False || {a,b} | "a" | True | | abc | "ab" | False || {a,b} | "c" | False | | * | "" | True || {a,b} | "ab" | False | | * | "abc" | True || *{x,y} | "abcx" | True | --- ## Objects vs. Functions - Create matchers for particular cases instead of one big function - Some of those matchers need extra data - E.g., which literal characters to match - So create objects --- ## Design Patterns - Use the
Chain of Responsibility
pattern - Each object matches if it can… - …then asks something else to try to match the rest of the text
--- ## Matching a Literal String ```py class Lit: def __init__(self, chars, rest=None): self.chars = chars self.rest = rest def match(self, text, start=0): end = start + len(self.chars) if text[start:end] != self.chars: return False if self.rest: return self.rest.match(text, end) return end == len(text) ``` - `chars` is the characters to be matched - `rest` is the rest of the chain (or `None`) - `start` is needed when this isn't the first matcher --- ## Testing the Matcher ```py def test_literal_match_entire_string(): # /abc/ matches "abc" assert Lit("abc").match("abc") def test_literal_substring_alone_no_match(): # /ab/ doesn't match "abc" assert not Lit("ab").match("abc") def test_literal_superstring_no_match(): # /abc/ doesn't match "ab" assert not Lit("abc").match("ab") ``` - Give tests long names to make failure reports immediately readable --- ## Does Chaining Work? - Try to find flaws in the design as early as possible - So test chaining before writing more matchers ```py def test_literal_followed_by_literal_match(): # /a/+/b/ matches "ab" assert Lit("a", Lit("b")).match("ab") def test_literal_followed_by_literal_no_match(): # /a/+/b/ doesn't match "ac" assert not Lit("a", Lit("b")).match("ac") ``` --- class: aside ## Test-Driven Development - Some people write tests *before* writing code to clarify the design - And to ensure they actually write tests - Research shows the order doesn't matter
[
Fucci2016
]
- What *does* is alternating between short bursts of coding and testing --- ## Wildcards - `*` can match zero or more characters - If it's the last matcher, it always succeeds - Otherwise try zero characters, one, two, etc. characters ```py class Any: def __init__(self, rest=None): self.rest = rest def match(self, text, start=0): if self.rest is None: return True for i in range(start, len(text)): if self.rest.match(text, i): return True return False ``` --- ## And We Test It ```py def test_any_matches_empty(): # /*/ matches "" assert Any().match("") def test_any_matches_entire_string(): # /*/ matches "abc" assert Any().match("abc") def test_any_matches_as_prefix(): # /*def/ matches "abcdef" assert Any(Lit("def")).match("abcdef") def test_any_matches_as_suffix(): # /abc*/ matches "abcdef" assert Lit("abc", Any()).match("abcdef") def test_any_matches_interior(): # /a*c/ matches "abc" assert Lit("a", Any(Lit("c"))).match("abc") ``` --- ## Matching Alternatives ```py class Either: def __init__(self, left, right, rest=None): self.left = left self.right = right self.rest = rest def match(self, text, start=0): return self.left.match(text, start) or \ self.right.match(text, start) ``` --- ## And We Test It ```py def test_either_two_literals_first(): # /{a,b}/ matches "a" assert Either(Lit("a"), Lit("b")).match("a") def test_either_two_literals_not_both(): # /{a,b}/ doesn't match "ab" assert not Either(Lit("a"), Lit("b")).match("ab") ``` --- ## But Wait… ```py def test_either_followed_by_literal_match(): # /{a,b}c/ matches "ac" assert Either(Lit("a"), Lit("b"), Lit("c")).match("ac") def test_either_followed_by_literal_no_match(): # /{a,b}c/ doesn't match "ax" assert not Either(Lit("a"), Lit("b"), Lit("c")).match("ax") ``` ``` ======================= test session starts ======================== test_glob_problem.py F. [100%] ===================== short test summary info ====================== FAILED test_glob_problem.py::test_either_followed_by_literal_match =================== 1 failed, 1 passed in 0.00s ==================== ``` - `Either` doesn't handle `rest` properly --- ## Rethinking - We now have three matchers with the same interfaces -
Refactor
using
Extract Parent Class
- The test `if self.rest is None` appears several times - Use the
Null Object
pattern instead
--- class: aside ## We Didn't Invent This
-
[
Tichy2010
]
showed that learning these patterns makes people better programmers --- ## The Parent Class ```py class Match: def __init__(self, rest): self.rest = rest if rest is not None else Null() def match(self, text): result = self._match(text, 0) return result == len(text) ``` - Assume every child class has a `_match` method - This method returns the location to continue searching - So `Match.match` checks that we've reached the end of the text --- ## The Null Object Class ```py class Null(Match): def __init__(self): self.rest = None def _match(self, text, start): return start ``` - Must be the last one in the chain - Doesn't advance the match (i.e., does nothing) - Every other class can now delegate to its `next` without checking for `None` --- ## Refactoring Literal Matcher ```py class Lit(Match): def __init__(self, chars, rest=None): super().__init__(rest) self.chars = chars def _match(self, text, start): end = start + len(self.chars) if text[start:end] != self.chars: return None return self.rest._match(text, end) ``` - Ask parent class to do common initialization - Return `None` for "no match" or whatever `self.rest` returns - If `rest` is `Null`, result will be the index after this object's match --- ## Refactoring Wildcard ```py class Any(Match): def __init__(self, rest=None): super().__init__(rest) def _match(self, text, start): for i in range(start, len(text) + 1): end = self.rest._match(text, i) if end == len(text): return end return None ``` - The exercises will ask, "Why `len(text) + 1`?" --- ## Refactoring Alternatives ```py class Either(Match): def __init__(self, left, right, rest=None): super().__init__(rest) self.left = left self.right = right def _match(self, text, start): for pat in [self.left, self.right]: end = pat._match(text, start) if end is not None: end = self.rest._match(text, end) if end == len(text): return end return None ``` - Looping over left and right options is simpler than repeating code or writing a helper method - Could easily be extended to any number of alternatives --- ## Testing - None of the existing tests change - None of the constructors changed - Neither did the signature of `match` - We should (should) add a couple of tests for `Null` - But basically we're done - And we can easily add matchers for other kinds of patterns --- class: summary ## Summary