How I Explain Code
I recently built a very small static site generator because… well, the reasons don’t matter (and aren’t actually compelling), but sharing it with someone gave me an opportunity to think about how I explain my code to someone else. The explanation is below and the code follows; I hope it’s useful.
-
Start with the
main()
function- Parse command-line arguments and store the result in
opt
- Create an
Environment
for loading templates (explained below) - Create a set called
skip
of top-level directories to ignore - Find all the files that need to be rendered or just copied
- For each file:
- If it’s a Markdown file, render it
- Otherwise, copy it
- Parse command-line arguments and store the result in
-
So what’ an “environment”?
- This program uses Jinja2 to stuff content into HTML templates
- Jinja2 needs to know where to find the HTML templates
- So we create an
Environment
with aFileSystemLoader
and tell thatFileSystemLoader
where to look opt
holds the result of parsing command-line arguments, andopt.templates
is the path to the templates directory
-
Now let’s take a look at
find_files()
- It returns a dictionary whose keys are file paths and whose values are the contents of those files
- It uses a dictionary comprehension to do this instead of a loop
- First line of the comprehension specifies key and value
- Second line is what to look at
- Third is the condition for inclusion
- We use
read_file()
to read a file- If the file is a text file, use
read_text()
- Otherwise, use
read_bytes()
(e.g., for images)
- If the file is a text file, use
- To find files we might be interested in we use a “glob”
- The name is short for “global” and is old-fashioned Unix terminology
opt.root
is the root directory of our project- The pattern
**/*.*
matches everything in subdirectories (**/
) with a two-part name (*.*
)
- The condition in the dictionary comprehension uses a function
is_interesting_file()
- If you’re not a file, you’re not interesting
- If your name starts with
.
(as in.gitignore
), you’re not interesting - If your suffixes isn’t in
SUFFIXES
(defined at the top of the file), you’re not interesting - If your parent directory’s name starts with
.
(as in.git
), you’re not interesting - If we have a set of things to skip and your path starts with one of those things, you’re not interesting
- Otherwise, OK, fine, you’re interesting
-
Back to
main()
and the loop over files- If the file isn’t Markdown, we just copy it
- …after making sure the output directory exists
-
And finally,
render_markdown()
- Use a library function
markdown()
to convert Markdown to HTMLMARKDOWN_EXTENSIONS
(defined at the top of the file) is a list of extra features we want to enable, such as Markdown tables
- Next, load the
page.html
template- A more sophisticated tool would allow authors to specify a template in the header of the Markdown file or as a command-line argument
- Then ask the template to take the HTML produced from the Markdown
and stuff it into the HTML template we just loaded
page.html
has a placeholder to show where
- Use a library function
-
We could stop here, but we want to fix up the generated HTML a bit
- So we parse the HTML produced by Jinja2 using Beautiful Soup to get a tree of objects in memory (as opposed to a great big string)
- And then apply several functions to it in order
- Each of these functions take the current document tree and returns a modified one
- Once that’s done, we make sure the output directory exists
and then write the document tree as HTML text
- Potentially renaming it from upper case to lower case
-
So what are these transformations?
do_markdown_links()
finds hyperlinks to.md
files and turns them into hyperlinks to.html
files- We write hyperlinks to
.md
files in the source so that files will inter-link correctly when viewed on GitHub - But our final website will have
.html
files, so we need to convert
- We write hyperlinks to
do_root_path_prefix()
looks for links whose names start with@root
and turn them into relative links up to the root directory of the project- For example,
@root/static/page.css
becomes./static/page.css
if the Markdown file in in the root directory of the project but../../static/page.css
if the file is two levels down
- For example,
do_title()
finds theH1
heading in the HTML page and copies its contents into the<title>
element of the page- We can easily add more transformation functions
- And in fact the full version of this program has transformers to handle bibliography citations and glossary references
-
In answer to a couple of questions:
- I organize code more-or-less alphabetically out of habit; I know people who try to organize in calling order, but that breaks down as soon as you have utility functions that are called in multiple places.
- I explain code more or less in execution order
"""Convert a pile of Markdown files to HTML."""
import argparse
from bs4 import BeautifulSoup
from jinja2 import Environment, FileSystemLoader
from markdown import markdown
from pathlib import Path
MARKDOWN_EXTENSIONS = ["attr_list", "def_list", "fenced_code", "md_in_html", "tables"]
SUFFIXES_BIN = {".ico", ".jpg", ".png"}
SUFFIXES_SRC = {".css", ".html", ".js", ".md", ".py", ".sh", ".txt"}
SUFFIXES_TXT = SUFFIXES_SRC | {".csv", ".json", ".svg"}
SUFFIXES = SUFFIXES_BIN | SUFFIXES_TXT
RENAMES = {
"CODE_OF_CONDUCT.md": "code_of_conduct.md",
"CONTRIBUTING.md": "contributing.md",
"LICENSE.md": "license.md",
"README.md": "index.md",
}
def main():
"""Main driver."""
opt = parse_args()
env = Environment(loader=FileSystemLoader(opt.templates))
skips = {opt.out, opt.templates}
files = find_files(opt, skips)
for filepath, content in files.items():
if filepath.suffix == ".md":
render_markdown(env, opt.out, filepath, content)
else:
copy_file(opt.out, filepath, content)
def copy_file(output_dir, source_path, content):
"""Copy a file verbatim."""
output_path = make_output_path(output_dir, source_path)
output_path.parent.mkdir(parents=True, exist_ok=True)
write_file(output_path, content)
def do_markdown_links(doc, source_path):
"""Fix .md links in HTML."""
for node in doc.select("a[href]"):
if node["href"].endswith(".md"):
node["href"] = node["href"].replace(".md", ".html").lower()
def do_root_path_prefix(doc, source_path):
"""Fix @root links in HTML."""
depth = len(source_path.parents) - 1
prefix = "./" if (depth == 0) else "../" * depth
targets = (
("a[href]", "href"),
("link[href]", "href"),
("script[src]", "src"),
)
for selector, attr in targets:
for node in doc.select(selector):
if "@root/" in node[attr]:
node[attr] = node[attr].replace("@root/", prefix)
def do_title(doc, source_path):
"""Make sure title element is filled in."""
doc.title.string = doc.h1.get_text()
def find_files(opt, skips=None):
"""Collect all interesting files."""
return {
filepath: read_file(filepath)
for filepath in Path(opt.root).glob("**/*.*")
if is_interesting_file(filepath, skips)
}
def is_interesting_file(filepath, skips):
"""Is this file worth checking?"""
if not filepath.is_file():
return False
if str(filepath).startswith("."):
return False
if filepath.suffix not in SUFFIXES:
return False
if str(filepath.parent.name).startswith("."):
return False
if (skips is not None) and any(str(filepath).startswith(s) for s in skips):
return False
return True
def make_output_path(output_dir, source_path):
"""Build output path."""
if source_path.name in RENAMES:
source_path = Path(source_path.parent, RENAMES[source_path.name])
source_path = Path(str(source_path).replace(".md", ".html"))
return Path(output_dir, source_path)
def parse_args():
"""Parse command-line arguments."""
parser = argparse.ArgumentParser()
parser.add_argument("--out", type=str, default="docs", help="output directory")
parser.add_argument("--root", type=str, default=".", help="root directory")
parser.add_argument("--templates", type=str, default="templates", help="templates directory")
return parser.parse_args()
def read_file(filepath):
"""Read file as bytes or text."""
if filepath.suffix in SUFFIXES_TXT:
return filepath.read_text()
else:
return filepath.read_bytes()
def render_markdown(env, output_dir, source_path, content):
"""Convert Markdown to HTML."""
html = markdown(content, extensions=MARKDOWN_EXTENSIONS)
template = env.get_template("page.html")
html = template.render(content=html)
transformers = (
do_markdown_links,
do_title,
do_root_path_prefix, # must be last
)
doc = BeautifulSoup(html, "html.parser")
for func in transformers:
func(doc, source_path)
output_path = make_output_path(output_dir, source_path)
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(str(doc))
def write_file(filepath, content):
"""Write file as bytes or text."""
if filepath.suffix in SUFFIXES_TXT:
return filepath.write_text(content)
else:
return filepath.write_bytes(content)
if __name__ == "__main__":
main()