Best Practices for Scientific Computing
The following pre-print is now available on arXiv:
D.A. Aruliaha, C. Titus Brownb, Neil P. Chue Hongc, Matt Davisd, Richard T. Guye, Steven H.D. Haddockf, Katy Huffg, Ian Mitchellh, Mark Plumbleyi, Ben Waughj, Ethan P. Whitek, Greg Wilsonl, and Paul Wilsong.
aUniversity of Ontario Institute of Technology, bMichigan State University, cSoftware Sustainability Institute, dSpace Telescope Science Institute, eUniversity of Toronto, fMonterey Bay Aquarium Research Institute, gUniversity of Wisconsin, hUniversity of British Columbia, iQueen Mary University London, jUniversity College London, kUtah State University, and lSoftware Carpentry.Abstract
Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists' productivity and the reliability of their software.
- Write programs for people, not computers.
- A program should not require its readers to hold more than a handful of facts in memory at once.
- Names should be consistent, distinctive, and meaningful.
- Code style and formatting should be consistent.
- All aspects of software development should be broken down into tasks roughly an hour long.
- Automate repetitive tasks.
- Rely on the computer to repeat tasks.
- Save recent commands in a file for re-use.
- Use a build tool to automate their scientific workflows.
- Use the computer to record history.
- Software tools should be used to track computational work automatically.
- Make incremental changes.
- Work in small steps with frequent feedback and course correction.
- Use version control.
- Use a version control system.
- Everything that has been created manually should be put in version control.
- Don't repeat yourself (or others).
- Every piece of data must have a single authoritative representation in the system.
- Code should be modularized rather than copied and pasted.
- Re-use code instead of rewriting it.
- Plan for mistakes.
- Add assertions to programs to check their operation.
- Use an off-the-shelf unit testing library.
- Turn bugs into test cases.
- Use a symbolic debugger.
- Optimize software only after it works correctly.
- Use a profiler to identify bottlenecks}.
- Write code in the highest-level language possible.
- Document the design and purpose of code rather than its mechanics.
- Document interfaces and reasons, not implementations.
- Refactor code instead of explaining how it works.
- Embed the documentation for a piece of software in that software.
- Conduct code reviews.
- Use code review and pair programming when bringing someone new up to speed and when tackling particularly tricky design, coding, and debugging problems.
- Use an issue tracking tool.