Blog

2020 2021 2022 2023
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
2004 2005 2006 2007 2008 2009

A Book I Overlooked

I realized this morning that I left Programming Languages: An Interpreter-Based Approach out of the acknowledgments for Software Design by Example:

cover of 'Programming Languages' by Kamin

I learned almost everything I know about how programs actually work from reading it. The first chapter builds a tiny little language—basically, a calculator for simple arithmetic expressions. Each subsequent chapter adds a feature like lists, variables, or functions, but rather than presenting a single design Kamin talks through alternatives. How could variable scoping work? How should it work, and how do implementation details and ease of use affect each other? Its approach was one of the inspirations for 500 Lines or Less, and it’s one of the few books I would put beside Kernighan’s classics for the clarity of its prose. My copy disappeared years ago, but if you can find one, it still has a lot to teach.

Where the Time Goes

Bar chart of time spent on different occupations

Styling Diagrams for Software Design by Example

I use draw.io to create diagrams for my books. Right now I use Comic Sans as a font and the sketch style pioneered by rough.js because people give more honest feedback on work that looks rough:

Comic Sans and sketch style

I recently switched to the Atkinson Hyperlegible font for this site to improve accessibility, and am wondering whether my diagrams would be easier for people to read if I did the same. My options are to use Atkinson Hyperlegible with the sketch style:

Comic Sans and sketch style

or to use it with a smooth drawing style:

Comic Sans and sketch style

If you have a preference, please let me know. I’d be grateful if you could also let me know if you have visual difficulties or work with people who do, since my main concern is accessibility rather than aesthetics. Thanks in advance for your help.

Software Design by Example Summary

I wrote 22 posts this month to summarize the chapters in Software Design by Example and to ask who’d be interested in taking an online class based on its Python successor. The list below links to each post in the series, and to the terms defined in each of the book’s content chapters.

My next goal is to prepare an hour-long talk summarizing what I learned from writing this book. It will cover some of the same material as “Software Design for Data Scientists” and this paper, but will focus more on how to teach design and what junior programmers are ready to learn. If you or your team would like to listen to me ramble, I’d be very happy to deliver it online in exchange for a small donation to a mutually-agreed charity—please reach out if you’re interested.

  1. Introduction

  2. Systems Programming: anonymous function, asynchronous, Boolean, callback function, cognitive load, command-line argument, console, current working directory, destructuring assignment, edge case, filename extension, filesystem, filter, globbing, idiomatic, log message, path (in filesystem), protocol, scope, single-threaded, string interpolation.

  3. Asynchronous Programming: call stack, character encoding, class, constructor, event loop, exception, fluent interface, method, method chaining, non-blocking execution, promise, promisification, protocol, UTF-8.

  4. Unit Testing: absolute error, actual result (of test), assertion, caching, defensive programming, design pattern, dynamic loading, error (in a test), exception handler, expected result (of test), exploratory programming, fail (a test), fixture, global variable, introspection, lifecycle, pass (a test), relative error, side effect, Singleton pattern, test runner, test subject, throw (exception), unit test.

  5. File Backup: Application Programming Interface, collision, comma-separated values, Coordinated Universal Time, cryptographic hash function, data migration, handler, hash code, hash function, JavaScript Object Notation, mock object, pipe, race condition, SHA-1 hash, stream, streaming API, Time of check/time of use, timestamp, version control system.

  6. Data Tables: character encoding, column-major storage, data frame, fixed-width (of strings), fixed-width (of strings), garbage collection, heterogeneous, homogeneous, immutable, index (in a database), JavaScript Object Notation, join, pad (a string), row-major storage, sparse matrix, SQL, tagged data, test harness.

  7. Pattern Matching: base class, Chain of Responsibility pattern, child (in a tree), coupling, depth-first, derived class, Document Object Model, eager matching, eager matching, greedy algorithm, lazy matching, node, Open-Closed Principle, polymorphism, query selector, regular expression, scope creep, test-driven development.

  8. Parsing Expressions: finite state machine, literal, parser, precedence, token, Turing Machine, well formed, YAML.

  9. Page Templates: bare object, dynamic scoping, environment, lexical scoping, stack frame, static site generator, Visitor pattern.

  10. Build Manager: automatic variable, build manager, build recipe, build rule, build target, compiled language, cycle (in a graph), dependency, directed acyclic graph, driver, interpreted language, link (a program), pattern rule, runnable documentation, stale (in build), Template Method pattern, topological order.

  11. Layout Engine: attribute, cache, confirmation bias, design by contract, easy mode, layout engine, Liskov Substitution Principle, query selector, signature, z-buffering.

  12. File Interpolator: header file, literate programming, loader, sandbox, search path, shell variable.

  13. Module Loader: absolute path, alias, circular dependency, closure, directed graph, encapsulate, immediately-invoked function expression (IIFE), inner function, Least Recently Used cache, namespace, plugin architecture.

  14. Style Checker: abstract syntax tree, Adapter pattern, column-major storage, dynamic lookup, generator function, intrinsic complexity, Iterator pattern, linter, Markdown, row-major storage, walk (a tree).

  15. Code Generator: byte code, code coverage (in testing), compiler, Decorator pattern, macro, nested function, two hard problems in computer science.

  16. Documentation Generator: accumulator, block comment, deprecation, doc comment, line comment, slug.

  17. Module Bundler: entry point, module bundler, transitive closure.

  18. Package Manager: backward-compatible, combinatorial explosion, heuristic, manifest, patch, prune, SAT solver, scoring function, seed, semantic versioning.

  19. Virtual Machine: Application Binary Interface, assembler, assembly code, bitwise operation, disassembler, instruction pointer, instruction set, label (address in memory), op code, register, virtual machine, word (of memory).

  20. Debugger: breakpoint, source map, tab completion, watchpoint.

  21. Conclusion

  22. Would You Take This Class?

Would You Take This Class?

I plan hope to spend February and March revising the Python version of Software Design by Example. I’m thinking about creating a slide deck for each of the chapters to help me do that—boiling things down to point form helps me see what’s out of order or missing entirely. The best way to test a slide deck is to teach from it, so I’d like to know who’d be interested in an hour-a-week online class that taught participants how to build a dozen applications from the list below in Python:

  • a testing framework
  • an interpreter
  • a versioned file backup system
  • a file cache
  • a dataframe
  • an object persistence framework
  • a binary storage framework
  • a build manager
  • a static site generator
  • a page layout engine
  • a text editor
  • a web server
  • a package manager
  • a regular expression matcher
  • a regular expression parser
  • a style checker
  • a virtual machine
  • a debugger

The cost would be a donation equivalent to a day’s pay to a mutually-agreed charity. Breaking that down:

If you’re interested, please let me know; if a dozen people are willing to commit, I think it would be a lot of fun.

Who Is This Class For?

These three personas define this class’s intended audience:

Slicing another way, participants should be able to:

Software Design by Example: Conclusion

We’ve come a long way since the start of this series of posts. I hope that understanding how version control systems, unit testing frameworks, style checkers, module bundlers, and debuggers work will help you use them better. And as I wrote four weeks ago, I hope that understanding their designs will help you make things that are elegant as well as useful.

But I want this post to be a beginning, not an ending. My examples are almost all command-line applications; I would love to see a second volume that took apart front-end tools and distributed applications, but don’t know enough to write it. I do, however, have a lot of experience editing, so if you’d like to contribute a chapter to a collection that described working models of editors, message queues, databases, and the like, please get in touch.

I would also enjoy hearing from anyone who is using this material in class. What didn’t make sense? Are the exercises comprehensible and at the right level? Where are more diagrams needed? I’m always happy to do a guest lecture in exchange for feedback…

Finally, I’ve spent 40 years building things out of bits. None of my code was ever widely used, but I’m quite proud of how well some of it was made. I hope that one day we’ll have a vocabulary for talking about that—about what it means for software to be “beautiful”. Until then:

Start where you are
Use what you have
Help who you can

Cover of 'Software Design by Example'

Benchmarking Languages

Back in the 1980s, R.W. Hockney introduced two measures for quantifying the performance of pipelined machines. The first, r, is the pipeline’s maximum possible performance on an infinitely large data set. The second, n½, is how much data you need in order to reach half that theoretical peak, i.e., how big your problem has to be to offset startup overheads. In the mid-90s, I suggested that another performance measure would be equally interesting: how long it takes to write code that achieves half a machine’s rated performance. I called it p½, and observed that for many supercomputers it was effectively infinity, since no real application ever achieves more than 10-20% of the performance that manufacturers claim.

I no longer believe that measuring time-to-solution is the right approach, but I still think that we can learn a lot from comparing implementations of common problems in different languages. The Cowichan Problems that I proposed for measuring p½ aren’t right for comparing the general-purpose languages vying to be the next Python, but I think we’d all learn a lot if people implemented a few of the examples from Software Design by Example in Rust, Zig, Nim, Haskell, Elixir, F#, Scala, and what-not. My choices would be:

Between them, these five examples touch on most of the core design ideas I used when programming, such as actions as objects (the editor), hashing (the file backup tool), dependency management (the build manager), introspection (the style checker), and asynchronous I/O (the web server). They are also a nearly-complete minimal development stack—only “nearly” because I felt the ideas in a unit testing framework overlapped enough with those in a style checker that the set only needed one or the other.

I think these problems are different enough from each other that each one’s implementation would highlight something about the language that would otherwise not be seen. If you’re teaching a software design class with an interesting new language and want to give these to your students as assignments, please let me know how it goes.

Software Design by Example 20: Debugger

The penultimate chapter of Software Design by Example explores one of the question that sparked this book: how does an interactive breakpointing debugger work? The first time I saw one in action I was completely baffled. How could the computer stop where I wanted and then keep going? How could it interact with me? Finding out was a lot harder than it should have been: there were (and are) lots of books and courses on compilers and operating systems, but nothing that explained the workings of this magical, powerful tool.

This chapter answers my long-ago questions by building a simple single-stepping debugger It also shows how to create unit tests for interactive applications. That’s a lot to pack into just two thousand words, but most of the ideas have come up before.

I’m pleased that I was able to answer my younger self’s questions. However, I’m still annoyed by the way debuggers are mostly left out of the curriculum. Of the forty-odd students I interviewed for software engineering intern positions last year, fewer than a dozen knew how to set a breakpoint in an IDE. Andreas Zeller’s The Debugging Book has a very informative chapter, and I recommend the whole book to every working programmer, but I’ll believe our profession is serious about building code that works when every undergraduate computer science program has a full-semester course on how to use tools to find and fix bugs.

Testing interactive application
Figure 20.2: Replacing input and output to test interactive applications.

Terms defined: breakpoint, source map, tab completion, watchpoint.

Software Design by Example 19: Virtual Machine

To explain how languages like JavaScript actually work, Chapter 19 of Software Design by Example builds a very simple virtual machine (VM) and shows how to turn a low-level programming language into instructions for it. The VM has three parts:

  1. An instruction pointer that holds the memory address of the next instruction to execute.

  2. Four registers that instructions can access directly. There are no memory-to-memory operations: everything happens in or through registers.

  3. 256 words of memory, each of which can store a single value.

I was surprised that this chapter didn’t need to be longer. Looking over it again, though, I realize there’s a lot of code I didn’t have to explain. Once readers have seen how add works, they don’t need to see subtract; once they’ve seen how to copy a value from memory into a register, they don’t need to see how to copy values in the other direction, and so on.

I really wish I’d had enough spoons to build an interactive visualization of this VM. Since I didn’t, I suggest you check out the game Human Resource Machine. And if you want to know (a lot) more about this level in the tower of computing, Bob Nystrom’s Crafting Interpreters is free to read online and one of the most beautifully crafted books on computing I’ve ever seen.

Virtual machine architecture
Figure 19.1: Architecture of the virtual machine.

Terms defined: Application Binary Interface, assembler, assembly code, bitwise operation, disassembler, instruction pointer, instruction set, label (address in memory), op code, register, virtual machine, word (of memory).

Blackberries

When I was young, I got up early to go out and pick blackberries several times each summer. My mum turned some into jam, but she always froze a pail or two so that she could make me a pie for my birthday. Every year it was sweet and tart and bursting with flavor—exactly what I needed in BC in January when the sky had been gray for weeks and everything felt damp.

Mum died three years ago just a few days before she was going to come to visit us in Toronto. As we were sorting out her home I found two buckets of blackberries in her freezer. I don’t know who picked them for her, but I’d like to think she was planning to smuggle them onto the plane somehow so she could make me a birthday pie.

My wife used the last of them this afternoon. I’m sixty years old; the seeds will get stuck in my teeth and I’ll probably need to wipe my eyes, but it’ll be worth it for just a little taste of those long-ago summers.

blackberry pie