Not on the Shelves

Version 3

(This article originally appeared online in 2009.)

Every couple of years, I indulge in a bit of sympathetic magic by putting together a list of books I want someone to write, so that I can review them. If nothing else, doing this helps me figure out what I currently think is and isn't important in our profession. Previous versions were written in 1997 and 2003; here's today's.

The Architecture of Open Source Applications

This survey, which is intended for use in a senior undergraduate course, describes the architecture of a dozen pairs of open source applications: Apache and LigHTTPD, Firefox and Opera, Gnumeric and OpenOffice Calc, and so on. For each pair, the book lays out the key characteristics of the problem domain, sketches a "reference architecture" for a solution, then dissects the chosen applications in detail, describing where, how, and (most importantly) why they differ from, or elaborate on, the reference architecture. Each chapter is a useful introduction to the innards of the systems it discusses, but together, they provide compelling examples of how to design and explain large pieces of software.

The Design and Implementation of Virtual Machines

With the exception of C/C++, most widely-used languages today sit on top of some kind of virtual machine. However, while undergraduates are routinely taught how to build compilers, they are rarely if ever shown how VMs work. This textbook corrects that via stepwise refinement: the first three chapters present a very simple direct interpreter, which each of the next 11 chapters improves in some way. More focused than either of Smith & Nair's or Craig's books, DIVM is also an excellent introduction to systems programming.

Debuggers: Theory and Practice

Despite their importance, debuggers are ignored by both educators and authors. In part, this is because there isn't a tidy theory to teach, as there is with (for example) parsers or databases, but in part there is also an element of disdain for something that is "just a tool". As this book shows, it might be just a tool, but it's a very complicated tool to get right. Like Levine's excellent Linkers and Loaders (but unlike Rosenberg's shallow and disappointing How Debuggers Work), this book delves into the details rather than glossing over the "hard" bits, and would be a great text for a second course on computer architecture.

Software Carpentry for Scientists and Engineers

This book is an introduction to basic software development practices for seniors, graduate students, and professionals whose background is science or engineering, rather than programming. After quick tutorials on the shell, version control, and automated builds, it introduces Python, which it then uses as a platform on which to present regular expressions, file system operations, job control, processing XML and binary data, and so on. Lots of "how" is mixed in with the "what": the chapter on writing classes in Python is followed by one on unit testing, while the chapter on processing XML is followed by one on design patterns. While it necessarily glosses over many fine points, taken as a whole, the book gives scientists and engineers a useful toolkit, and a sense of where to go next.

Software Tools for the World-Wide Web

Kernighan and Plauger's Software Tools was one of the most influential books in the history of computing, as it introduced a whole generation of programmers to the Unix philosophy of tool-based computing. In retrospect, one of the reasons that philosophy succeeded was its reliance on a universal data format (strings of ASCII text) and communication protocol (stdin, stdout, and exit(N)). This book's starting point is the now-commonplace observation that XML and HTTP have taken their place, and goes on to build a suite of ever-more-sophisticated tools for assembling web-based applications that use them. Crucially, the author does not shy away from the thorny issues of transactions and partial failure: from Chapter 8 onward, her examples all discuss the ways in which distributed applications differs from their desktop predecessors, and how programmers should take that into account.

Jonad in a Nutshell

Windows PowerShell (originally codenamed Monad) was as revolutionary in its own way as XML and HTTP: instead of flat text, PowerShell allowed tools to send one another streams of objects, with all the flexibility and extensibility that implies. This book describes Jonad, an open source workalike that uses Javascript as its scripting language instead of PowerShell's custom Perl-like syntax. While it is not (yet) as well-integrated with Linux as PowerShell is with recent releases of Windows, Jonad has brought thousands of administrators and power users back to the command line. These users will find this reference guide indispensible.

Computing and the Law: A Guide for the Perplexed

The legal aspects of the software business were complicated enough when the major problem was people using software without paying for it. Today, courts and legislatures are routinely asked to deal with child pornography, file downloading, and a host of other issues whose nature they barely comprehend, while programmers lose sleep over privacy, liability, and the like. This book seeks to help the latter community by tracing the historical development of patents, copyrights, and professional responsibilities from the American Revolution to the present day. Aimed squarely at programmers with no prior exposure to legal terminology, it explains concepts clearly, and provides examples for each. Where they can, the authors concentrate on principles rather than particular statutes, as the latter are so often either non-existent or changing rapidly. This not only makes the book more readable, it also ensures that it won't quickly be outdated.

Programming Small Devices

This book's aim is to help students who have been spoiled by gigabytes of RAM and gigahertz processors deal with the harsher, leaner world of small devices. Like Müldner's C for Java Programmers, it assumes readers already know how to program, and focuses on dispelling their misconceptions and curing their sloppy habits. What do you do when you don't have garbage collection? What do you do when you don't even have floating point, or when you have to worry about watts as well as bytes? Incorporating as it does a dense chapter on the basics of queueing theory and performance modeling, this is an excellent introduction to programming in a world where nothing is free.

Quality Assurance: A Modern Synthesis

Quality assurance (QA) has gone through a not-so-quiet renaissance in the past ten years, inspired primarily by Extreme Programming and the surprisingly rapid adoption of JUnit. A growing number of programmers actually test their code, and a growing number of those are building ever-more-advanced tools to help do it. This survey describes the principles that underpin those tools, ranging from mock objects and browser-based acceptance testing to more experimental ideas of the kind discussed in Zeller's Why Programs Fail. It then goes on to look at testing as a social science, which may point the way to the next wave of innovation.

Extensible Programming Systems

Today's computers are millions of times faster than their circa 1970 ancestors; today's programs are hundreds of times more complex; but today's source code is still a slave to the formatting conventions of the punch card era. This book describes a new kind of programming ecosystem, in which compilers, linkers, debuggers, and other tools are plugin frameworks, rather than monolithic applications, and programs are stored as XML documents, so that programmers can store and manipulate data and meta-data uniformly. The end result is a programming environment in which ordinary programmers can define and share powerful new ways to represent their intentions.

Difference Engines

Modern version control systems do a great job of managing text, but are much clumsier when it comes to images, MP3s, spreadsheets, and other so-called "binary" files. The reason is simple: those formats are supported by tools for reading and writing, but not for differencing and merging. This survey describes the science behind an open source library of tools (the "engines" of the title) that can handle many widely-used formats, from structured text like XML to common audio and video standards. Some readers will be put off by the mathematics the authors use when proving bounds on the performance of the algorithms underlying these tools, but even they should appreciate the power and elegance of this work.

A Reference Architecture for Web 2.0

The latest generation of web applications use much richer technologies than their 90s-era forebears. AJAX (Asynchronous Javascript and XML) is the most visible of these, but under the hood, frameworks like Ruby On Rails and TurboGears rely on object-relational mapping, reflection, inversion of control, and many other advanced techniques. This book describes this "New Standard Model" from the ground up, showing how the pieces fit together, and discussing what developers can do when they don't. Its emphasis on testing and debugging is particularly welcome, as all too often programmers programmers can only "patch and pray" when something goes wrong somewhere in the many layers of abstraction that were supposed to make their lives easier.

Big, Fast, Cheap, or Good: A Student's Guide to Software Project Management

Students' first software projects are different from "real" industrial projects in many ways: student teams usually don't have a more experienced project manager to show them the ropes, and team members are almost always timeslicing their work with other courses. This down-to-earth book is written with that audience, and those constraints, in mind. It takes ideas from Berkun's Art of Project Management, Doar's Practical Development Environments, Fogel's Producing Open Source Software, Glass's Facts and Fallacies of Software Engineering, and Stellman & Greene's Applied Software Project Management, but explains them to people who have never before had to find their way around a large body of code, or coordinate with half a dozen other people.

Exploring Computer Science with Python

Among the literally thousands of introductory texts on computer science, the few that used Python have never really caught on. This book may change that: it covers the same broad range of topics as Brookshear's popular Computer Science: An Overview, introducing Python almost as a side effect of explaining what algorithms are, how computers represent information, and why perfect security will never be possible. While the pace and depth probably put it out of reach for some students, those thinking about majoring in CS will find it an excellent, practical grounding in the discipline.