Long-time readers of this blog and our discussion list will know that I'm unhappy with the choices we have for formatting our lessons. Thanks to a tweet from Karl Broman, I may have an answer. It's outlined below, and I'd be grateful for comments on usability and feasibility.
Here's a summary of the forces we need to balance:
- People should be able to write lessons in Markdown. We choose Markdown rather than LaTeX or HTML because it's easier to read, diff, and merge; we choose it rather than AsciiDoc or reStructuredText (reST) because it's much better known.
- People should be able to preview their lessons locally before publishing them, both to avoid embarrassment and because many people compose offline.
- Lessons should be easy to write and read. We shouldn't require people to put div's and other bits of HTML in their Markdown.
- It should be easy to add machine-comprehensible structure to lessons. We want to be able to build tools to extract lesson titles, count challenge exercises, etc., all of which requires machine-comprehensible source. This is in tension with the point above: everything we do to make lessons more readable to computers means extra work or less readbility for people.
We should use only off-the-shelf tools.
We don't want to have to build, document, and maintain custom plugins for formatting tools.
We do want to use GitHub's
- The workflow for creating and publishing lessons should be authentic, i.e., the way people write and publish lessons should be a way they might use to write and publish research papers.
And here's the proposal:
- We stop relying on Jekyll and start using Pandoc instead.
Every lesson is stored in a GitHub repository that has a
gh-pagesbranch. (GitHub will automatically publish the files in that branch as a mini-website.)
The root directory of that repository contains:
README.mdfile with a one-liner about the lesson's content and authorship;
- a sub-directory called
srcthat contains the source files for the lesson;
- the compiled versions of those files; and
- an empty file called
.nojekyllto tell GitHub that we don't want it to run Jekyll.
srcdirectory contains all the source files for the lesson, and a simple
Makefilethat uses Pandoc instead of Jekyll to compile those files. Pandoc's output goes in the root directory, i.e., one level above the
srcdirectory, and the Makefile makes sure that other files (CSS, images, etc.) are copied up as well.
- When an author makes a change, she must build locally, then commit those files to the GitHub repository. Yes, this means that generated files are stored in version control, which is normally regarded as a bad idea. But it does mean we can use Pandoc, which supports a nicer dialect of Markdown than Jekyll on GitHub, and we don't have to worry about compiling files on one branch and committing them to another.
I've created a proof-of-concept repository
to show what this might look like in practice.
It seems to work pretty well,
and I think it satisfies the "authentic workflow" requirement
(though I'd be grateful if others could tell me it doesn't).
The only usability hiccup I can see is that
authors will have to remember to commit the generated files:
my usual workflow of
git add -A
git commit -m
only adds files in or below the current working directory,
so I would have to
cd .. up from
to the root directory of their local copy of the repo first.
One variation on this raised by Trevor King is
to keep the source files in the root directory of the
and have the lesson maintainer merge changes into the
src directory of the
and do the build.
This frees authors from having to install the build tools—only
the maintainers need that—but on balance,
I think most people will want to preview before uploading,
so the savings will be mostly theoretical.
If you have other thoughts, or can suggest other improvements, please add comments to this post. We'd particularly like to hear from people who aren't Git experts or aren't familiar with HTML templating systems, Makefiles, and the like. Does the workflow described above make sense? If not, what do you think would go wrong where, and why?
This post originally appeared in the Software Carpentry blog.