These ideas are things I might have done if I hadn’t done Software Carpentry. Many are presented as descriptions of books that don’t yet exist because I still believe that a good book can change the world. If you’d like to help with any of them or take one on yourself, please let me know.

Sex and Drugs and Guns and Code: What Everyone in Tech Needs to Know About Politics, Economics, and Power

Inspired by books like Economics for Everyone and The Imposter’s Handbook, this book is aimed at people in tech who want to understand how the world works, but have limited time in which to do it. Rather than science or programming, it introduces ideas and methods that are commonplace in the social sciences, and shows that “the way things are” is neither inevitable nor accidental. From the reasons racial discrimination persists despite its illogical economic inefficiency to the ways in which “flat” organizations actually operate and how cognitive biases affect us all, it gives readers a toolbox for thinking about society and how tech is reshaping it for both better and worse.

The Undergraduate Operator’s Manual

What effect does pulling an all-nighter have on the quality of your work? How do people absorb and retain knowledge? What are some good ways to run a project meeting, and how can you get people to actually pull their weight? Every undergraduate has to deal with these issues and a hundred others, but most of the time, they are expected to rediscover or reinvent methods themselves. This textbook for a one-semester course puts solutions in one place, and by doing so teaches a lot about physiology, psychology, and organizational behavior.

Software Tools in JavaScript

Software Tools and its sequel Software Tools in Pascal were introduced a whole generation of programmers to the Unix philosophy of tool-based computing. This book’s starting point is the observation that JavaScript, HTTP, and JSON have taken the place of strings of ASCII and standard I/O. Drawing from sources as diverse as Jon Udell’s “Seven Ways to Think Like the Web”, the Kinetic Rule Language, and the mostly-functional model associated with Elm and Redux, it presents a “new standard model” based on syndication of distributed streams of events, and shows readers how the tools they use are built, and how to build tools of their own.

Research Computing from A to B

Based on the workshops run by Software Carpentry and Data Carpentry, this book is a hands-on introduction to practical computing skills aimed at graduate students and professionals in research-intensive disciplines. The core topics–data management, structured programming, task automation, and version control–are introduced through a series of short tutorials, then elaborated with further lessons on using the web to share data, creating reproducible workflows, cleaning data, and testing software when the right answer isn’t actually known. While it necessarily glosses over many fine points, it gives readers a useful toolkit and a sense of where to go next.

Managing Research Software Projects

Your graduate degree is in ecology, but now you’re running a three-person team responsible for building and maintaining a hundred thousand lines of code? This book is an overview of everything you absolutely, positively need to know after you know how to program: marketing, community management, leading a lab, and basic finance. We’ve made a start, but there’s a lot still to be done.

Software Engineering: An Evidence-Based Approach

Unlike their counterparts in physics, psychology, and engineering, most students in computer science don’t do experiments. As a result, they graduate not knowing how to get data, clean it up, model it, and draw conclusions from it. This innovative textbook corrects that: it tackles simple real-world problems using basic statistical methods and data harvested from actual software projects. Its larger message is that opinions about software should be based on evidence rather than hearsay and strong opinions. (Derek Jones’ Empirical Software Engineering Using R is headed in this direction.)

Computing and the Law: A Guide for the Perplexed

The legal aspects of software have always been complicated; the web has done nothing to make them simpler. This book seeks to help programmers understand the rules (or lack thereof) they have to live with by tracing the historical development of patents, copyrights, privacy, and professional liability from the Industrial Revolution to the present day. Aimed squarely at people with no prior exposure to legal terminology, it explains concepts clearly and provides examples for each.

Software Architecture by Example

Architects study hundreds of buildings during their training; writers read hundreds of novels, and mathematicians study at least that many proofs. In contrast, most software engineers only explore a handful of medium-sized programs during their training. This book corrects that by contrasting alternative implementations of key features of open source applications. Whether it’s the undo/redo stacks of Vim and Emacs, how Apache and Nginx manage user plugins, or the way that React and Angular decide what to re-render, each paired example serves as a springboard for larger discussion of how software is designed and how tradeoffs are made. The book draws material from The Architecture of Open Source Applications, but is a tutorial rather than a survey.

A Practical Introduction to Debugging

Most programmers spend a large part of their time debugging, but most books only show working code, and never discuss how to prevent, diagnose, and fix errors. Most books ostensibly about debugging are either high-level handwaving (“Make sure you’re solving the right problem”) user’s guides for particular debugging tools, or out of date. The one notable exception, Zeller’s Why Programs Fail, is an excellent read, but too advanced for most undergraduates. This book fills that gap by combining an exploration of how debugging tools actually work with dozens of case studies showing how to apply them to real-world problems. And while the author only occasionally makes this explicit, the book also shows how to write programs that are easier to fix.

Now What? A Practitioner’s Guide to Error Handling

Programs can fail in a hundred different ways, but most programmers either ignore the possibility of failure or deal with it by printing a log message. This companion to A Practical Introduction to Debugging presents examples of what they could do instead, from data structure repair to automatically restarting servers. Along the way, it catalogs the kinds of errors that programmers may encounter and shows how they can be prevented as well as managed.

300 Lines of Science

Can you write a climate simulator in less than 500 lines of Python? What about constructing phylogenetic trees in less than 500 lines of R? This collection would show readers how science is turned into code across a broad range of disciplines. Each entry is less than 300 lines of code in the style of 500 Lines or Less supplemented by an equal-sized chunk showing how to test what has been written.

Performance Tuning

In the spirit of Jon Louis Bentley’s Writing Efficient Programs, this textbook shows readers how to model, analyze, and improve the performance of their programs. Written for undergraduates who already have a basic understanding of computer architecture, compilers, operating systems, and networks, it can be used in a capstone course that unifies ideas from these subjects.

Teach programmers how to run grassroots get-into-coding groups

Programmers create get-into-coding groups so that people won’t have to teach themselves how to write Javascript, then reinvent wheels when it comes to teaching, running a non-profit, or organizing a community. I am adding a section to How to Teach that draws on the experience of Software Carpentry and books like Building Powerful Community Organizations to fill that gap.


Allow people to create synchronized voiceovers for HTML slideshows. I’ve had several summer students take a run at this; the hard part is the authoring tool to add time marks, but as the demo linked in the title shows, the idea itself works.

Diff and merge for common document formats

Version control is a powerful idea, but it depends on people being able to work independently, then see what they’ve changed and merge those differences. Unfortunately, none of today’s open source version control systems can handle the world’s most common document formats: Word, Excel, and PowerPoint. A tool (or suite of tools) that could do this would give millions of people an on-ramp instead of a cliff.

An empirical comparison of the syntax of Python, R, MATLAB, and Julia

We tried to repeat Stefik et al’s study of programming language syntax for languages commonly used in science, but weren’t able to get enough subjects. I think it’s worth trying again, both for its own sake and to show that this kind of work can and should be done.

The Discussion Book online

Today’s MOOC platforms use the Internet like television. What would they look like if they directly supported some of the techniques described in this useful book? Similarly, Caulfield’s notion of choral explanations has me thinking that I’ve been mistaken in trying to treat lesson construction as software development. A “lesson” platform that uses Stack Overflow as its model rather than GitHub or Wikipedia would be fascinating to explore, as would collaborative choral software exegesis.

Using machine learning to find actual design patterns

I often use Sajaniemi et al’s roles of variables in teaching, but like the classic design patterns, they were “discovered” by eyeballing novice code. I think that cluster analysis of patterns of class and variable use would uncover more patterns, and confirm my suspicion that some of the classics are really just different names for the same thing. Similarly, I think that analysis of actual usage patterns could lay the foundation for the design of a better version control system.

Numerical JavaScript

Ten years from now, I believe that JavaScript (or a derivative like TypeScript) will have supplanted Python and R as the language of choice for people doing leading-edge open scientific computing, because no matter what else programmers use, they eventually have to learn JavaScript. More specifically, I expect that the 5-15% of scientists who are early adopters of new technology will bypass single-purpose languages like Julia in favor of the one they already have to master to create websites and use things like D3. And with major players like Microsoft, Google, and Facebook all working hard to make general-purpose JavaScript faster, it will be harder and harder for niche players to keep up.