Programmers follow many conventions to make code repos more usable,
from having README, LICENSE, and CONDUCT files in the root directory
to naming conventions, package structure, and having make test or npm run test do something useful.
Every time I see an aggregation like this,
I wish we had similar interoperability/discoverability conventions for tutorials.
That doesn’t mean “one template to rule them all”
any more than it means every software project must have exactly the same structure, but…
…imagine a world in which every tutorial had a GLOSSARY file beside its README and LICENSE files
so that you could easily find out what terms that lesson defined
(which in turn would tell you what it was about)?
Or a convention for structuring exercises…
…similar to the conventions that programming languages have for structuring tests
(where they are, what they’re called, etc.).
There are standards for creating and packaging learning objects for use with learning management systems,
…they are much too burdensome for most free-range teaching—it’s as if our only options were chaos or Enterprise Java.
I hope some day that creating a lesson on GitHub is as easy
(i.e., as well supported both technically and socially)
as creating a Node project.
One way forward could start with a change of perspective:
instead of thinking a package as “code with some docs”,
think about it as “docs with some supporting code”.
What would we have to add to turn a site like https://readthedocs.org into a lesson hub?
Auto-grading is a distraction in this context—it’s certainly not a must-have,
not even for coding lessons.
(Proof: textbooks have been around and useful for centuries without auto-grading.)
But think about things like…
…consistent numbering of exercises and figures across multiple source documents.
Think about learning objectives that can be put in each individual lesson,
but extracted and collated as a summary.
All of these things…
…can be done with existing tools,
but they’re all harder and more fragile than documenting the parameters to a method
or cross-referencing classes and their ancestors/decendants.
(If your “solution” requires people to indent their YAML, it’s not a solution.)
And yes, the analogy with reusing code is flawed—the Reusability Paradox
is very real—but I believe discoverability and interoperability are worthwhile, achievable goals.
How can I tell at a glance that a site or repo is a lesson rather than [noun]?
What about the (human) language it’s in?
Ditto its major topics.
Much of the discussion that followed on Twitter and by email took the form,
“If we build a good enough template, people will adopt it.”
I think that if better templates were the solution,
they would have worked by now.
I have written several myself,
including the first version of the Carpentries template;
I’ve used several others,
and a search on GitHub turns up dozens of others.
The problem is that templates help present the lesson but don’t help write it.
all of the templates I’ve looked at allow authors to write whatever they want for learning objectives.
They don’t help connect those objectives together
or check that the objectives are still consistent with the lesson’s content:
“still” because lessons evolve over time,
and people forget to check and update objectives
just as they forget to check and update documentation and installation instructions for code.
Just as UML doesn’t help developers solve the hard parts of their problems,
static website frameworks don’t help teachers any more than a Microsoft Word or Google Doc template.
That’s a slight exaggeration:
templates do help keep common values like the author’s email address consistent.
However, the cost is very high:
installing Jekyll or Hugo,
writing YAML to configure them,
and getting the right plugins in place (never mind keeping them up to date)
is a hell of a cover charge.
The payoff seems small,
particularly if you aren’t already steeped in the technology,
so on any given Thursday,
most teachers really are better off using Word or Docs.
The Glosario project (site, repo)
is an experiment to see if we can do better.
Suppose you’re writing a lesson about data science using Markdown.
Instead of using bold or [an arbitrary link](http://somewhere/#term) to highlight a definition,
refer to [the online glossary](https://glosario.carpentries.org/en/#term) (in your preferred language), or
call an inline function like gloss('key', 'inline text') to generate that text and link
(if you’re using something like R Markdown).
In both cases,
tools can now tell which terms a particular lesson defines.
Those terms can then be put in the <head> of a web page,
stuffed in a database,
or added to a summary page for the course as a whole.
Doing this accomplishes two things:
It makes the lessons more discoverable.
It allows to build another tool for stitching lessons together.
People will not manually highlight all of the terms they use in a lesson that they don’t define:
it’s too much work,
and requires too much re-work as the glossary evolves.
it would be relatively straightforward to compare the text of a lesson against a glossary
to determine what terms it depends on
so that we can say, “You probably want to look at lessons X and Y before tackling lesson Z.”
No templating engine I know of will do this right now.
Most of them won’t even do simpler but equally necessary things
like create two-part IDs for the figures within chapters
and then fill in references with chapter.figure.
Yes, I’ve used GitBook,
No, they are not solutions for the 99.9% of our species
that can’t or won’t spend their afternoons messing with Pandoc templates,
and conflicts between multiple YAML configuration files.
So here’s what I’d like,
I do think this would be a great project for someone doing a PhD in HCI:
finish the tooling for Glosario,
fold it into lessons written with the template of your choice,
and tell us what you’ve learned about how to build it
and about whether authors, instructors, and learners actually find it useful.
It may seem like a small thing,
but I bet it’s harder than it looks,
and I bet we’ll learn a lot more from doing it than we expect.
I think it would a good first step toward shifting our attention from the page to the lesson.
Most people programming today have never punched a card,
but all programming editors still treat code as lines of text—in other words,
as if it still might have to fit onto punchcards.
As I’ve been saying ranting for a while now,
this is holding us back in ways we can barely recognize.
One example is YAML,
the insistence that people must write complex nested data structures as indented lines of text.
The rules are well-defined and simple cases are simple,
but as anyone who has spent an hour wrestling with a Jekyll or Bookdown configuration file can attest,
any complex case is an unproductive nightmare waiting to escape its cage.
So here’s a thought experiment.
Imagine that every editor from Notepad
to VS Code
automatically displayed CSV files as editable tables.
Instead of editing this:
but if programmers could trust everyone’s favorite editor
to render rows and columns as rows and columns,
I believe that:
most people would choose to use it instead of JSON, YAML, TOML, and STUMBL
(OK, I made that last one up, but you weren’t sure, were you?) because
they would find it easier to read and write nested structures
if their editor gave them even this little bit of help and guidance.
But I don’t have any evidence for my second claim,
which is where you (the ambitious grad student looking for a project) come in.
Is there a difference in frustration quotient between YAML-in-a-text-editor
and the same data in a table editor?
Do people like the experience better if the table editor lives inside their usual editor?
And can people find bugs faster or more reliably
if nested structures are presented as tables rather than as indented text?
My bet is “yes” for all three,
but I don’t want you to trust me
because I don’t want you to trust people—I want you to trust data.
And of course once this is working,
the next experiment would be to add a tree editing widget to several common programming editors
and see if it’s better, worse, or the same.
I use [ a | b ] is my way of showing two editable cells side by side,
and for fairness’ sake I think it’s essential to add these widgets to existing editors:
many programmers will change operating system, citizenship, and gender
before abandoning Emacs.
I have to fix two bugs in the examples,
draw 91 diagrams,
fill in an appendix on cognitive load,
and then revise all 52,000 words,
but the first draft is done.
Feedback would be greatly appreciated:
you can mail me
or file issues in the book’s GitHub repository.
Using callbacks to manipulate files and directories
Using promises to manage delayed computation
Testing software piece by piece
Archiving files with directory structure
Loading, saving, and manipulating tables efficiently
Using patterns to find things in data
Turning text into code
Generating HTML pages from templates
Updating files that depend on other files
Figuring out what goes where in a web page
Managing source files that have been broken into pieces
Loading source files as modules
Checking that code conforms to style guidelines
Modifying code to track coverage and execution times
Generating documentation from comments embedded in code
Turning many files into one
Getting and installing packages
Assembling and running low-level code
Running programs under the control of a breakpointing debugger
By popular request,
here is my spouse’s recipe for pickled carrots,
which is derived from the one in Topp and Howard’s excellent
The Complete Book of Small-Batch Preserving.
Note that this is a canning recipe, not a cold pickle.
Note also that the original recipe calls for only ¼ tsp of hot pepper flakes per jar,
to which we say, “Bah.”
finely chopped fresh oregano or 1 tbsp (15 mL) dried
hot pepper flakes (per jar)
small cloves garlic
peeled baby carrots
1 ½ cups
Remove the hot jars from the canner and add one garlic clove and your desired volume of chili flakes to each jar.
Pack in the carrots (see picture) leaving 1cm (½ inch) of head space.
Combine vinegar, sugar, water, and salt in a small saucepan and bring to a boil.
Pour hot liquid over carrots (in jars) to within ½ inch of the top.
Process for canning: 15 minutes for 500 mL jars.
A 2lb (approx. 1kg) bag of baby carrots will make 1750 mL of pickle.
We usually use 250 mL wide-mouth jars—they make great gifts.
completed a master’s in library science five years ago
and has since worked for a small aid organization.
She did some statistics during her degree,
and has learned some R and Python by doing data science courses online,
but has no formal training in programming.
Amira would like to tidy up the scripts, data sets, and reports she has created
in order to share them with her colleagues.
These lessons will show her how to do this.
completed an Insight Data Science fellowship last year after doing a PhD in Geology
and now works for a company that does forensic audits.
He uses a variety of machine learning and visualization packages,
and would now like to turn some of his own work into an open source project.
This book will show him how such a project should be organized
and how to encourage people to contribute to it.
became a competent programmer during a bachelor’s degree in applied math
and was then hired by the university’s research computing center.
The kinds of applications they are being asked to support
have shifted from fluid dynamics to data analysis;
this guide will teach them how to build and run data pipelines
so that they can pass those skills on to their users.
We organized the book around a running example:
the verification of Zipf’s Law
using a set of classic English novels
in an open, reproducible, and sustainable way.
(People often conflate these three ideas,
but they are distinct).
To do that,
we teach readers to do these things:
Use the Unix shell to efficiently manage data and code.
Organize small and medium-sized data science projects.
Write Python programs that can be used on the command line.
Use Git and GitHub to track and share work.
Work productively in a small team where everyone is welcome.
Use Make to automate complex workflows.
Enable users to configure software without modifying it directly.
Test software and know which parts have not yet been tested.
Find, handle, and fix errors in code.
Publish code and research in open and reproducible ways.
Create Python packages that can be installed in standard ways.
The order is important because later skills depend on earlier ones,
but also because we want people to be able to stop part way through
and still have a workable research process.
If you only get through the fourth chapter,
you’ll be able to back up your work,
share it with others,
and re-run analyses with a single command.
Another chapter and you’ll be ready to collaborate in a team;
one more after that,
and you’ll be able to capture your workflows in re-runnable ways.
We don’t believe any one book can serve everyone’s needs,
but we hope this one will help people who already know how to write a bit of code
figure out what to learn next and what “done” looks like.
The HTML version of the book will stay online for free and forever;
we’ll advertise the printed and e-book versions as soon as they become available.
If you find it useful,
please let us know
(and please also let us know about any errors or murky wording you stumble over).