We have been teaching software development to computational scientists since 1997. Starting with a training course at Los Alamos National Laboratory, our work has grown into an open source course called Software Carpentry that has been used by companies, universities, and the US national labs, and has had over 100,000 visitors since going live in August 2006. Based on our experiences, we believe the following:
- People cannot think computationally until they are proficient at programming. (Non-trivial coding is also a prerequisite for thinking sensibly about software architecture, performance, and other issues, but that's a matter for another position paper...)
- Like other "knowing how" skills, such proficiency takes so much time to acquire that it cannot be squeezed into existing curricula without either displacing other significant content, or pushing out completion dates. (Saying "we will put a little into every course" is a fudge: five minutes per lecture adds up to three or four courses in a four-year degree, and that time has to come from somewhere.)
- Most universities are not willing to do either of these things. The goal stated in the workshop announcement, "Create better scientists not by increasing the number of required credits," is therefore unachievable.
- In contrast with experimentalists, most computational scientists care little about the reproducibility or reliability of their results. The main reason for this is that journal editors and reviewers rarely require evidence that programs have been validated, or that good practices have been followed—in fact, most would not know what to look for.
- Most scientists will not change their working habits unless the changes are presented in measured steps, each of which yields an immediate increase in productivity. In particular, the senior scientists who control research laboratories and university departments have seen so many bandwagons roll by over the years that they are unlikely to get excited about new ones unless the up-front costs are low, and the rewards are quickly visible.
- The most effective way to introduce new tools and techniques is over an extended period, in a staged fashion, in parallel with actual use—intensive short courses are much less effective (or compelling) than mentorship. Peer pressure helps: if training is offered repeatedly, and scientists see that it is making their colleagues more productive, they will be more likely to take it as well. Team learning also helps: non-trivial scientific programming is a team sport, and mentorship is one of the best training methods we know. In particular, team members with distinct roles rallying around the science is the most successful strategy of all, but does not translate into a classroom setting.
- A mix of traditional and agile methodologies is more appropriate for the majority of scientific developers than either approach on its own: scientists are question-led (which encourages incremental development), but the need for high performance often mandates careful up-front design. Scientists cannot therefore simply adopt commercial software engineering practices, but will have to tailor them to their needs.
- Raising general proficiency levels is the only way to raise standards for computational work, and raising standards is necessary if we are to avoid some sort of "computational thalidomide" in the near future.