"What would Greg do? [pause] OK, now that we've ruled that out..."
I wrote a post last July
about using package managers
like RPM, Homebrew, and Conda to track dependencies between lessons,
so that a student could say something like
conda install unit_testing
and get a lesson on unit testing,
along with the code, sample data, and other lessons it depends on.
I also mused that it could help make research more reproducible:
a paper is just a lesson on something that's never been taught before.
This idea isn't new. Konrad Hinsen wrote about using package management for reproducibility back in 2012, and later about why he decided to go a different route. W. Trevor King has written about it as well, while Rémi Emonet and Raniere Silva built a small prototype last summer.
I'm still not sure whether this is a good idea, and since I've always done what passes for my best thinking when I've got something to fix rather than a blank sheet of paper, I've thrown together a really small demo. I'm sure it's wrongheaded in many ways, but I hope it will help focus discussion by giving people something specific to correct. If you'd like to kick its tires:
Make sure you have Python 2.* installed.
Clone this GitHub repository.
makeon its own to get a list of available commands.
make createto create a distribution file
make installto install that package in your Python distribution. You may wish to create a virtual environment before doing this so as not to pollute your Python distribution. However,
make installwrites a list of installed files to
installed-files.txt, so you can
make uninstallto delete them all.
Once the lesson is installed,
lesson view somethingwill open it in your browser. This emulates a learner viewing the lesson locally.
mkdir /tmp/stuff(or some other temporary directory) and then
lesson files something /tmp/stuffwill copy the lesson's code and data into
/tmp/stuff. This emulates a learner getting the sample code and data files for the lesson.
Behind the scenes,
installation uses a standard Python
to create a
lessons sub-directory in your Python distribution
and then copy the lesson material under there.
It also installs a script called
your Python distribution's
A real system would separate these:
people would only install
and each particular lesson would then be packaged and installed separately.
This little demo doesn't specify any dependencies, so it doesn't install any supporting tools or prerequisite lessons. That would be straightforward to add, but that's another way of saying, "We don't need to think about it right now." What we do need to think about is: how to handle lessons for R, the shell, GitHub, and so on, and whether Python's packaging tools are the right platform for this. I'm pretty sure the answer to the second question is "no", but alternatives are either OS-specific, require more effort at first encounter than most lesson authors will be willing to invest, or both.
The long-term goal of this work is to create something like
Like those archives,
it would require people to package their lesson in a particular way.
Once they'd done that,
their work would be easier to find and use.
And as I said at the outset,
if we can make this work for lessons,
there's no reason we can't make it work for papers.
(I for one would have been grateful if I could have run
pip install doi://arxiv.org/1111.1111
to get a local, runnable copy of the paper I'm supposed to be reviewing right now.)
Packaging and distribution is a headache and a nightmare and one of practical computing's greatest unsolved problems, but if we want to work through someone's lesson, or reproduce and extend a colleague's research, we have to get the raw material installed somehow. Today's packaging systems pay much less attention to docs than they do to code; I think that making the former a first-class citizen would be an interesting experiment, and I'd be grateful if you could comment on this post to tell me what you think.
This post originally appeared in the Software Carpentry blog.