While working on an outline of a new lesson on Python, I began thinking about the overall coherence of what we teach. In particular, I started to worry that we might be teaching some things because we teach them, i.e., that the curriculum might lose its connection to researchers' actual issues.

One method for keeping things grounded in the other field I still occasionally work in (empirical software engineering) is called Goal, Question, Metric. As the name suggests, it defines three questions: what are you trying achieve, what questions do you need answered in order to achieve it, and what metrics will you accept as answers to those questions. An educational equivalent is Question, Answer, Lesson: what questions do novices have, what answers do competent practitioners give them, and what lessons are needed to teach those answers. (The "do novices have" modifier is crucial: in order for our workshops to be appealing, they must answer the questions that novices actually have, not the ones we wish they would ask.)

Here's what I've come up with so far:

Questions Answers Lessons
How can I choose what tool to use?
How can I get help/fix this?
How can I get started?
How can I work in a team?
How can I make my software more useful?
How can I get my software to do more?
How can I make my work reproducible?
How can I get the right answer?
How can I understand the project I've inherited?
Automate tasks and analyses.
Avoid duplication.
Be welcoming.
Choose the right visualization.
Program defensively.
Document intention not implementation.
Use the experimental method.
Modularize software.
Normalize data.
Be open by default.
Organize projects consistently.
Do pre-commit reviews.
Publish software and data.
Reduce, re-use, recycle.
Create re-runnable tests.
Search the web.
Store raw data as it arrived.
Tune programs.
Understand data formats.
Understand error messages.
Understand how programs run.
Use checklists and to-do lists.
Use configuration files.
Use more hardware.
Use version control.
Data Management
Managing Software
Authoring and Publishing
Quality Assurance
Unix Shell
Version Control

But by themselves, these three lists aren't very useful. What really matters is the connections between them: which answers address which questions, and which lessons teach the ideas used in those answers? The obvious way to represent this is as a graph, since both relationships are many-to-many. So far, though, I haven't produced anything better than this:

Questions, Answers, and Lessons

(You can click on the image to see the full thing, or look here for the GraphViz source: run dot -Tsvg design-01.gv > design-01.svg to regenerate the SVG. Note that I've added a fourth column to the graph to show the half-day modules within each lesson, primarily to give a sense of how much time would be devoted to what.)

Drawing up these lists has already helped me figure out what we might teach in a two-week Carpentry-style class (a long-standing dream of mine), but:

  1. I'm pretty sure these still aren't the questions novices actually have, and

  2. as presently drawn, the graph is unreadable.

The first is more important right now than the second, so I would be grateful for feedback to go with that I've already received from Jackie Kazil, Noam Ross, Karen Cranston, and Andromeda Yelton. Please add comments to this post about which questions you'd add, delete, or change, and what you think the answers should be.

This post originally appeared in the Software Carpentry blog.