Not on the Shelves

Version 2

(This article originally appeared in Doctor Dobb's Journal in 2003.)

When I first started doing book reviews for Doctor Dobb's Journal in 1997, I wrote an article called "Not on the Shelves". In it, I described eighteen books I'd like to review that no-one seemed to have written. The world and I have both moved on since then; this article presents my current thoughts on the subject.

One important difference between then and now is the growing proportion of faculty with industrial experience. During the dot-com boom, many bright minds left academic departments to start companies. (At one point, the drop-out rate among Ph.D. students at Stanford was close to 80%.) Many are now drifting back to universities, either because their startup failed, or because they made some money, and can now afford to pursue their own interests. These people are more focused on building real systems than baby boomer colleagues who were hired in the 1970s or early 1980s, and they are going to want to change the curriculum to reflect this.

Introduction to Software Development

This book is aimed at undergraduate majors in Computer Science who have completed at least one year of Java programming. Its aim is to teach them what they need to know in order to develop software on spec and on time in a small team. The style is "eat your own cooking". For example, the first chapter covers version control, and in subsequent chapters, students do exercises by checking code out of a repository, working on it, and then submitting it for grading by checking it back in. Similarly, the chapter on testing introduces JUnit, a unit testing framework developed by the Extreme Programming community. Many subsequent examples then include a few unit tests, and a few of the exercises at the end of each chapter require students to write tests.

Two features of this book are particularly noteworthy. The first is the way in which it shows students how the theory can be put into practice. The chapter on build systems, for example, discusses basic graph algorithms as part of explaining how tools like Make work. It then uses the implementation of a simple Make-like tool as a setting for further discussion of finite state machines (for parsing) and design patterns.

The second noteworthy feature is the book's use of Stanley, the groupware tool which is the running example in Multi-Tier Architectures: Theory and Practice (reviewed below). Students using that book in a third-year course will therefore already be familiar with what Stanley does, understand the practices it is meant to support, and know how some of its components are built.

Introduction to Computer Architecture and Systems Programming

As a sophomore, I took two courses: one on data structures and algorithms, and one on machine architecture and assembly-language programming. The first is easy to translate into Java (see the review immediately following this one), but what about the second? Going from Pascal to PDP-11 assembler was bad enough; going from an interpreted, garbage-collected language to today's pipelined RISC architectures would be impossible.

This book solves the problem by treating C as a high-level assembly language, and using it to introduce students to low-level issues such as pointer arithmetic, memory allocation, machine-dependent data sizes, signed versus unsigned values, and string handling. Its running example is the implementation of a subset of the Java Virtual Machine (JVM). Like Kamin's Programming Languages: An Interpreter-Based Approach, this book builds up its JVM in stages. Each stage adds another feature (such as function calls), and then shows what has to go on under the hood to make it work. Along the way, students are introduced to the quirks of C, including the use of preprocessor directives to handle platform-dependent code.

By the middle of the book, the author has stopped adding features to the JVM, and started building support tools. The most important of these is an execution profiler, which uses sampling and instrumentation to collect statistics about program behavior. Results from it are used to segue into a discussion of caching, virtual memory, and other aspects of machine architecture that haven't been covered earlier. The author also covers error handling on both Unix and Windows, a topic which most other textbooks gloss over, but which makes up 30-40% of the code in real applications.

Multi-Tier Architectures: Theory and Practice

Most large computer systems these days are multi-tiered: cell phones and web browsers talk to web servers, which in turn talk to application servers backed up by LDAP directories and SQL databases. This book gives students and end-to-end view of these components, and their interactions. It starts off with two chapters on networking: what sockets and ports are, how name lookup works, and so on. Chapter 3 then implements a simple web server in Java, which Chapter 4 extends with multithreading. Chapter 5 then extends the server further to turn it into a simple Java servlet container modeled after the Apache Foundation's Tomcat.

The second half of the book switches from building a toy servlet engine to using a real one. Over the course of three chapters, it introduces students to Apache's Struts application framework. The exposition steers clear of the framework's more arcane details; instead, the emphasis is on analyzing it in terms of design patterns, and on using it to motivate discussion of reflection, dynamic class loading, performance issues, and other aspects of real systems. The chapter on caching, for example, analyzes half a dozen different algorithms from a theoretical point of view, and then compares that with the performance of actual implementations.

The running example in the book's second half is Stanley, a simple groupware tool modeled after SourceForge. Like SourceForge, Stanley lets users create and manage software projects. It combines web interfaces to version control, unit testing, batch builds, mailing lists, and so on, and its implementation is a good vehicle for introducing students to the tools, libraries, and protocols on which these tools are built. (Students who have used Introduction to Software Development in an earlier course will already be familiar with Stanley.)

Design and Implementation of Virtual Machines

It's difficult to get a degree in computer science without doing a course on compilers, but it's almost impossible to find a course on virtual machines. This book sets out to correct that by showing students how the "virtual" part of languages like Java, C#, and Python is built. It analyzes and compares the runtime engines for three popular languages: Java's JVM, Python's PVM, and .NET's CLI. The first is used as a baseline, since most students will already be familiar with it (particularly if they have already taken a course using Introduction to Computer Architecture and Systems Programming).

The PVM is used to show students how freely-typed interpreters are build; the authors could have used Scheme, Smalltalk, or Perl here, but as they say in their discussion, the first two are fading, while the third is far too tangled. Their discussion of Microsoft's CLI then gives them an opportunity to delve into just-in-time compilation, component systems, and generic programming. Here and elsewhere, they return to three topics over and over again: how to make it right, how to make it fast, and how to make it safe. Most students will probably find the two chapters on security issues fascinating, as the authors describe what buffer overrun attacks are, and how the JVM's class loader verifies byte codes. The book closes with a chapter each on how debuggers work, and on the C and C++ runtimes.

Quality Assurance: Theory and Practice

Lots of books have been written about software testing and quality assurance (QA), but few have sold well. This is partly due to snobbery---most programmers see testing as a low-status occupation---but the books themselves must share some of the blame. Take Dustin's recent Effective Software Testing as an example: it's impossible to argue with her "50 Specific Ways to Improve Your Testing" (to quote the book's subtitle), but the book doesn't actually show me examples of how to go about testing, or give me tools that I can use to do it.

In contrast, Quality Assurance: Theory and Practice describes both the testing process, and a suite of Open Source tools that developers can download and use. After one brief chapter on terminology (which thankfully only spends one page on horror stories of what can go wrong when software fails), the author moves straight into examples of unit testing. Chapter 3 then analyzes the patterns uncovered in Chapter 2, and uses those patterns to motivate the design of JUnit. Chapter 4 presents a catalog of common programming mistakes, along with JUnit tests that can catch them.

In the second part of the book, Chapter 5 looks at integrating testing into the build cycle using Ant and Dartboard. Chapter 6 then looks at testing multithreaded programs, while Chapter 7 moves on to distributed applications, and tools like the Apache Foundation's Cactus. Performance issues are covered in Chapter 8, while Chapters 9, 10, and 11 return to the software lifecycle by looking at bug-tracking systems, requirements analysis, and test plans. Testing for security holes gets a chapter of its own (Chapter 12), as do internationalization (Chapter 13) and usability (Chapter 14).

There are two reasons why this book will probably succeed where others have failed. The first is that its blend of "what" and "how" means that readers can immediately see how to implement the ideas that are being presented to them. The second, equally important, is that by showing programmers tools, the author is signalling that testing is a programming activity, and not something to be looked down on.

Modern Operating Systems

When Andrew Tanenbaum's Operating Systems: Design and Implementation appeared in the mid-1980s, a typical PC had an Intel 80286 processor with one small cache, 640 KByte of RAM, and a 10 MByte hard drive. Many of today's desktop machines have caches larger than those hard drives, and the complexity of operating systems has grown to the same degree. It is simply no longer possible for anyone, particular a student, to understand an entire desktop operating system from top to bottom.

Luckily for education, a host of simpler computing devices, and operating systems, are now available. Cell phones, PDAs, and children's game consoles all have their own OS's, and those are the focus of this book. The topics are the same as in most other books on operating systems: bootstrapping, concurrency, resource management, and so on. However, when the authors of "Modern Operating Systems" discuss Unix and Windows, they do so in the same terms that authors of earlier books used when talking about OS 360. The real action, they feel, is in small, interactive devices. Accordingly, they devote less space than most books to file systems, and more to networking and graphical displays. Palm OS, Symbian, and QNX are used as examples, and students are (strongly) encouraged to download the authors' own SLOS (Stupid Little Operating System), which runs native on Palm-compatible hardware, and in an emulator on Windows and Linux.

Case Studies in User Interface Design

Some philosophers distinguish between knowing that, and knowing how. The former is mostly facts: what is the capital of Samoa, what is the airspeed of an unladen African swallow, and so on. The latter primarily consists of techniques, like how to ride a bicycle, or how to design a user interface. Unfortunately, knowledge of the second type is hard to put into books, since every general rule has exceptions, and none of the important ideas can be given exact definitions.

This book on user interfaces tries to teach the "how" by working through fourteen examples. The starting point for each of the first seven is an existing interface that needs improvement. The authors describe the interface, analyze its shortcomings, and then show how to improve it. In the last three of these examples, the authors then criticize the new interface, and improve on it again.

The examples in the second half of the book start with a blank canvas. The first two studies begin by describing applications whose existing interfaces are command-line based; as the authors point out, interface designers are often required to retro-fit GUIs to existing programs, so they might as well get used to it early on. In the last five of the studies, the application itself is up for grabs: given a vague specification that reads like an email message from someone in marketing, the designers are required to figure out both what the program should do, and how it should appear.

The greatest strength of this book is that it shows how interfaces are developed, not just what they look like when they're done. Two of the studies, for example, devote as much space to blind alleys as to the final, finished interface. By doing this, the authors show how designers iterate over a design. Ideas such as balance and emphasis are taught by example, rather than by definition. Throughout, the authors are careful to draw examples from a variety of real systems, and to show variations appropriate to desktop clients, web browsers, and hand-held devices.

Computing and the Law

The legal aspects of the software business were complicated enough when the major problem was people using software without paying for it. The advent of the World-Wide Web has squared and cubed the problem. If you use a GIF image as a button in your home page, for example, and I download it for use in my page without asking your permission, am I breaking the law? What if you copied that button from someone whose page explicitly said that it wasn't in the public domain, but you didn't include a note to that effect? And what if I then printed out my page, GIF and all? Would that be illegal? In Ontario, the answers are (currently) "no", "no", and "yes", but other jurisdictions might not even officially recognize that there are issues to address.

Computing and the Law presents this problem, and several like it, in its first chapter. The next three chapters then trace the historical development of property law from land, through patents and copyrights, to the invention of photocopying. As the authors say at the start of chapter five, "That's when quiet hell broke loose." The authors show how cheap reproduction, particularly digital reproduction, is reshaping the intellectual underpinnings of capitalism. Look and feel, free software, and litigation as intimidation are all discussed, along with privacy rights and jurisdictional issues.

This book is aimed squarely at programmers with no prior exposure to legal terminology. New terms are clearly explained, and then put into context. One section discusses what rights students have to the software they produce during their studies; another, who owns things that were produced by companies that no longer exist, and another, the furore that surrounds encryption in the United States. Where they can, the authors concentrate on principles rather than particular statutes, as the latter are so often either non-existent or changing rapidly. This not only makes the book more readable, it also ensures that it won't quickly be outdated.

Software Development for Scientists and Engineers

This book covers much of the same ground as Introduction to Software Development (reviewed earlier), but with less depth and more breadth. Its audience is seniors, graduate students, or professionals whose background is science or engineering, rather than programming. Readers are assumed to know what loops, conditionals, and function calls are, but nothing beyond that is taken for granted.

The book starts off by introducing version control. It then introduces Python, a popular Open Source scripting language. The authors' aim is not to sell Python to readers, but rather to have a simple language in which to present regular expressions, file system operations, job control, processing XML and binary data, and so on. Lots of "how" is mixed in with the "what": the chapter on writing classes in Python is followed by one on unit testing, while the chapter on processing XML is followed by one on design patterns.

It would be impossible to cover all the material in this book in a single semester, but the later chapters are well suited to self-study. Advanced topics range from graphics through integrating with legacy C and Fortran to the software lifecycle. Taken as a whole, the book gives scientists and engineers a useful toolkit, and a sense of where to go next.