These ideas are things I might have done if I hadn’t done Software Carpentry. Many are presented as descriptions of books that don’t yet exist becauase I still believe that a good book can change the world. If you’d like to help with any of them, or take one on yourself, please let me know.
Sex and Drugs and Guns and Code: What Everyone in Tech Needs to Know About Politics, Economics, and Power
Inspired by books like Physics for Future Presidents and The Imposter’s Handbook, this book is aimed at people who need to understand the big picture, but have limited time in which to do it. Rather than science or programming, it introduces ideas and methods that are commonplace in the social sciences, and illustrates that “the way things are” is neither inevitable nor accidental. From the reasons racial discrimination persists despite its illogical economic inefficiency to the ways in which “flat” organizations actually operate and how cognitive biases affect us all, it gives readers a toolbox for thinking about society: the society that tech is reshaping for both better and worse.
The Undergraduate Operator’s Manual
What effect does pulling an all-nighter have on the quality of your work? How do people absorb and retain knowledge? What are some good ways to run a project meeting, and how can you get people to actually pull their weight? Every undergraduate has to deal with these issues and a hundred others, but most of the time, they are expected to rediscover or reinvent methods themselves. This textbook for a one-semester course puts solutions in one place, and by doing so teaches a lot about physiology, psychology, and organizational behavior.
Research Computing from A to B
Based on the workshops run by Software Carpentry and Data Carpentry, this book is a hands-on introduction to practical computing skills aimed at graduate students and professionals in research-intensive disciplines. The core topics–data management, structured programming, task automation, and version control–are introduced through a series of short tutorials, then elaborated with further lessons on using the web to share data, creating reproducible workflows, cleaning data, and testing software when the right answer isn’t actually known. While it necessarily glosses over many fine points, it gives readers a useful toolkit and a sense of where to go next.
Managing Research Software Projects
Your graduate degree is in ecology, but now you’re running a three-person team responsible for building and maintaining a hundred thousand lines of code? This book is an overview of everything you absolutely, positively need to know after you know how to program: marketing, community management, leading a lab, and basic finance. We’ve made a start, but there’s a lot still to be done.
Software Engineering: An Evidence-Based Approach
Unlike their counterparts in physics, psychology, and engineering, most students in computer science don’t do experiments. As a result, they graduate not knowing how to get data, clean it up, model it, and draw conclusions from it. This innovative textbook corrects that: it tackles simple real-world problems using basic statistical methods and data harvested from actual software projects. Its larger message is that opinions about software should be based on evidence rather than hearsay and strong opinions. (Derek Jones’ Empirical Software Engineering Using R is headed in this direction.)
Computing and the Law: A Guide for the Perplexed
The legal aspects of software have always been complicated; the web has done nothing to make them simpler. This book seeks to help programmers understand the rules (or lack thereof) they have to live with by tracing the historical development of patents, copyrights, privacy, and professional liability from the Industrial Revolution to the present day. Aimed squarely at people with no prior exposure to legal terminology, it explains concepts clearly and provides examples for each.
Software Architecture by Example
Architects study hundreds of buildings during their training; writers read hundreds of novels, and mathematicians study at least that many proofs. In contrast, most software engineers only explore a handful of medium-sized programs during their training. This book corrects that by contrasting alternative implementations of key features of open source applications. Whether it’s the undo/redo stacks of Vim and Emacs, how Apache and Nginx manage user plugins, or the way that React and Angular decide what to re-render, each paired example serves as a springboard for larger discussion of how software is designed and how tradeoffs are made. The book draws material from The Architecture of Open Source Applications, but is a tutorial rather than a survey.
A Practical Introduction to Debugging
Most programmers spend a large part of their time debugging, but most books only show working code, and never discuss how to prevent, diagnose, and fix errors. Most books</a> ostensibly</a> about</a> debugging</a> are either high-level handwaving (“Make sure you’re solving the right problem”) user’s guides for particular debugging tools, or out of date. The one notable exception, Zeller’s Why Programs Fail, is an excellent read, but too advanced for most undergraduates. This book fills that gap by combining an exploration of how debugging tools actually work with dozens of case studies showing how to apply them to real-world problems. And while the author only occasionally makes this explicit, the book also shows how to write programs that are easier to fix.
Now What? A Practitioner’s Guide to Error Handling
Programs can fail in a hundred different ways, but most programmers either ignore the possibility of failure or deal with it by printing a log message. This companion to A Practical Introduction to Debugging presents examples of what they could do instead, from data structure repair to automatically restarting servers. Along the way, it catalogs the kinds of errors that programmers may encounter and shows how they can be prevented as well as managed.
300 Lines of Science
Can you write a climate simulator in less than 500 lines of Python? What about constructing phylogenetic trees in less than 500 lines of R? This collection would show readers how science is turned into code across a broad range of disciplines. Each entry is less than 300 lines of code in the style of *500 Lines or Less supplemented by an equal-sized chunk showing how to test what has been written.
In the spirit of Jon Louis Bentley’s Writing Efficient Programs, this textbook shows readers how to model, analyze, and improve the performance of their programs. Written for undergraduates who already have a basic understanding of computer architecture, compilers, operating systems, and networks, it can be used in a capstone course that unifies ideas from these subjects.
Teach programmers how to run grassroots get-into-coding groups
Allow people to create synchronized voiceovers for HTML slideshows. I’ve had several summer students take a run at this; the hard part is the authoring tool to add time marks, but as the demo linked in the title shows, the idea itself works.
Diff and merge for common document formats
Version control is a powerful idea, but it depends on people being able to work independently, then see what they’ve changed and merge those differences. Unfortunately, none of today’s open source version control systems can handle the world’s most common document formats: Word, Excel, and PowerPoint. A tool (or suite of tools) that could do this would give millions of people an on-ramp instead of a cliff.
An empirical comparison of the syntax of Python, R, MATLAB, and Julia
We tried to repeat Stefik et al’s study of programming language syntax for languages commonly used in science, but weren’t able to get enough subjects. I think it’s worth trying again, both for its own sake and to show that this kind of work can and should be done.
The Discussion Book online
Today’s MOOC platforms use the Internet like television. What would they look like if they directly supported some of the techniques described in this useful book? Similarly, Caulfield’s notion of choral explanations has me thinking that I’ve been mistaken in trying to treat lesson construction as software development. A “lesson” platform that uses Stack Overflow as its model rather than GitHub or Wikipedia would be fascinating to explore, as would collaborative choral software exegesis.
Using machine learning to find actual design patterns
I often use Sajaniemi et al’s roles of variables in teaching, but like the classic design patterns, they were “discovered” by eyeballing novice code. I think that cluster analysis of patterns of class and variable use would uncover more patterns, and confirm my suspicion that some of the classics are really just different names for the same thing. Similarly, I think that analysis of actual usage patterns could lay the foundation for the design of a better version control system.