After DataCamp fired me in June, I spent a month getting Teaching Tech Together over the finish line. With that out of the way, I started thinking once again about what a full-semester or full-year version of Software Carpentry would cover. My first outline was far too ambitious: after re-reading Data Carpentry for Biologists and talking to half a dozen people, I’ve realized that it would take at least three terms to cover the material I had listed.

My revised breakdown is given below; in broad strokes, the first course is what every grad student analyzing data ought to know (basically, what the Carpentries cover), the second is for the minority of researchers working on larger projects, and the third is for the even smaller group of people who find themselves running those larger projects.

The outlines for these three courses are now hosted in the Merely Useful organization on GitHub. As always, I’d be very grateful for feedback—you can reach me at mailto:gvwilson@third-bit.com.

  • Term 1: basic skills
    • An introduction to tidy data using spreadsheets
    • The basics of Python (lists, loops, conditionals, libraries, and functions)
    • Using the Unix shell (basic commands up to pipes and simple shell scripts)
    • The basics of Git (as a single-user tool to coordinate work between multiple machines)
    • Line-oriented text processing (including the basics of regular expressions)
    • How to publish a static web site using Jekyll and GitHub Pages
    • Simple array manipulations with NumPy
    • Simple data frame manipulations with Pandas
  • Term 2: reproducible and scalable
    • The elements of Python style (including PEP-8 and docstrings)
    • Installing and managing libraries (pip and virtualenv)
    • Automating workflows with Make
    • Testing with pytest
    • Continuous integration
    • A branch-per-feature Git workflow
    • Organizing projects (Noble’s Rules and Taschuk’s Rules)
    • Writing reusable code (using functions as values)
  • Term 3: building a research software commons
    • Where to host projects and how to license them
    • Data provenance
    • Building and sharing packages
    • Doing code review
    • Development methodologies (agile and its kin)
    • Basics of online community organization and governance
    • Recruiting and mentoring volunteers
    • How to get the word out