What Success Looks Like Five Years Out

Posted 2011-12-24

Having talked about what I've learned and how well our teaching measures up, I'd like to explore what success would actually look like for Software Carpentry. Our long-term objective is to make productive and reliable computing practices the norm in science and engineering. My model is the way statistics became a "normal" part of most scientists' training in the 20th Century. Most psychologists and geologists are emphatically not statisticians, but most of them know (or at least have known, at some point) what a t-test is, what a correlation coefficient means, and how to tell when they're out of their depth. Equally, most scientists should know how to pipe a few simple filters together to clean up their data, how to use version control to track and share their work, whether their computational results are as trustworthy as their experimental results [1], and when to go and talk to either a real numerical analyst or a professional software developer.

On a five-year timescale, this translates into three concrete goals:

We are helping thousands of people every year.
We know that what we're doing is helping.
We have a double-digit bus factor.

We are helping thousands of people every year. Right now, we change the lives of a few dozen each year–certainly no more than a hundred. In order to scale up, we need:

dozens of people running Hacker Within-style workshops in labs and at universities, and
dozens more contributing content (particularly exercises), answering questions, offering one-to-one support [2], etc.

We need to do a lot to make this happen: create meta-content to tell people how best to teach and learn the actual content, build a distributed community, and so on. The key piece, though, is some kind of institutional recognition for participation, such as non-credit transcript items for grad students who organize workshops. Self-sacrifice for the general good only goes so far: given a choice between cranking out one more paper, however obscure, or organizing a workshop, most scientists have to do the former (at least until they have tenure), because the latter isn't taken into account by promotion scoring formulas. I think Mozilla's Open Badging Initiative is a good solution to the general problem of accrediting non-classroom competence, but we need to find a way to translate that into something that universities and industrial research labs can digest.

Why not just get schools to offer Software Carpentry as a credit course? We've been down that road, and the short answer is, they mostly won't. Every curriculum is already over-full; if a geology department wants to run a course on programming skills, they have to cut thermodynamics, rock mechanics, or something else. I might believe our stuff is more important, but existing scientific disciplines don't consider it part of their core (not even computer science). Until we build a critical mass of supporters [3], we'll have to work in the institutional equivalent of that windowless room in the basement that still smells like the janitorial storage closet it once was.

We know that what we're doing is helping. Testimonials from former students are heartwarming, but how do we know we're actually teaching the right things? I.e., how do we know we're actually showing scientists and engineers how to do more with computers, better, in less time? The only honest way to answer the question would be to do some in-depth qualitative studies of how scientists use computers right now in their day-to-day work, then go back and look at them weeks or months after they' had some training to see what had changed.

I think this kind of study has to be done qualitatively, through in-person observation, for two reasons:

We ran the largest survey ever done of how scientists develop and use software. Almost 2000 people responded to dozens of questions, and while we discovered a lot about when and how they learn what they do know, it didn't tell us anything about how much they know. The problem was calibration: if someone says, "I'm an expert at Unix," does that mean, "I've hacked on the Linux kernel" or "I use a Mac when everyone around me uses a PC"? Follow-up questions uncovered both interpretations; without some probing one-to-one analysis as a foundation, we have no way to estimate their distribution.
Our goal isn't really to change how quickly people do the same old things; it's to change the things they do. Someone who understands ANOVA probably designs and runs their experiments differently (and more effectively) than someone who doesn't; similarly, someone who understands the value of unit testing probably writes more reusable and more reliable code. Time-on-task measurements don't reveal this.

As I've said many times before, though, nobody seems to want to fund this kind of study. I've personally applied to three different government agencies and four different companies, without luck, and I know other people who have similar stories. It wouldn't take much: $50K to get started, or even $250K to do the whole thing properly, is peanuts compared to what those same agencies and companies are throwing into high-performance computing, or to the cost of the time scientists are wasting by doing things inefficiently. But as I've also said many times before, most people would rather fail than change…

Ranting aside, Software Carpentry needs to do this kind of study on a regular (ideally, continuous) basis in order to keep its offerings relevant and useful. That means we need to have stable long-term funding, and that brings us to the third (and possibly most important) point:

We have a double-digit bus factor. Software Carpentry is still essentially a one-person show. If we're to help more people and do the kinds of studies needed to find out if we are actually helping, we need more people to get involved. I dislike the phrase "building community", but that's what we need to do.

[1] See Cameron Neylon's post on good practice in research coding for a thought-provoking discussion of what it's fair to expect from scientists.

[2] Support could be either classical tech support ("Share your desktop with me for a minute, and I'll see if I can figure out why the Python Imaging Library won't install with JPEG support on your machine"), or personalized feedback and assessment ("I just looked at your solution to exercise 22, and here's what I think"). As I said in the previous post in this series, research has shown that this kind of "deep feedback" is crucial to real learning, but it is exactly what automated grading of drill exercises doesn't provide.

[3] We actually only need supporters in two camps: journal editors (who could insist that scientists provide their code for peer review, and that those reviews actually get done), and funding bodies (who could insist that code actually be reusable). Given those requirements, it would be in most scientists' own interests to improve their computing practices. Oh, and I'd also like a pony for Christmas…

Categories: software-carpentry