2004 · 2005 · 2006 · 2007 · 2008 · 2009
2010 · 2011 · 2012 · 2013 · 2014 · 2015 · 2016 · 2017 · 2018 · 2019
2020 · 2021

How I Write a Technical Book

I have written four technical books (with three more in progress) and have edited seven others. They have all been different, but I do now have something approximating a process. For Software Tools in JavaScript it went like this:

  1. Draw a concept map for major topics (about 20 nodes and 40 links).

  2. Start writing point-form notes for each chapter.
    • Topics map to chapters about 2:1.
    • Leave FIXME markers where figures are needed but don’t draw them yet.
    • Write notes and code for 3-4 hours/week over 12 months (on average—it’s very bursty).
  3. Write all example code while drafting the chapter.
    • Do a lot of rearranging at this stage, e.g., introduce sub-topic X as part of main topic Y.
    • Add some topics/chapters at this point: “I need to explain Z in order for X and Y to make sense and it doesn’t fit an existing chapter.”
  4. Write 8-10 exercises for each chapter and revise the point-form notes so that this is possible.
    • Do some more rearranging at this stage.
    • More importantly, cut material because it just doesn’t fit this book.
  5. Turn the point-form notes into prose.
    • One hour a day for 5 weeks turned 19 chapters of point-form notes (about 4 pages per chapter) into finished prose.
    • The finished prose is anywhere from 140% to 250% the length of the original notes.
  6. Draw the diagrams.
    • About 90% of the intended diagrams survive; I don’t think I added any at this stage, but I should have—I hate drawing diagrams.
    • One entire chapter was cut at this point because the examples didn’t work and the content didn’t really fit anyway.

At this point I have 385 printed pages. Based on previous books it will grow by 125% to 150% based on feedback as I explain things that were clear to me but aren’t to anyone else.

But Can She Type?

One of my favorite Twilight Zone episodes is “But Can She Type?”, in which the protagonist finds herself in a parallel universe where secretaries are treated the way rock stars are in ours. I think about it every time I stub my mind on questions like:

  • Why doesn’t Canada have a Natural Sciences and Engineering Learning Council to funding teaching the way NSERC funds research? Why doesn’t the US have a National Learning Foundation on par with the NSF?

  • I’ve used CTAN (the Comprehensive TeX Archive Network), CPAN (for Perl), CRAN (for R), and PyPI (Python’s equivalent—the letter ‘P’ was already taken). Why isn’t there a Comprehensive Learning Archive Network (CLAN)? I recognize that the Reusability Paradox would make the lessons in CLAN less immediately useful than the libraries in CRAN, but we could still do a lot to make lessons more discoverable.

  • So far as I’ve been able to determine, no computer science department at a major Canadian university has ever had a member of its teaching faculty as chair or head. Why not? On the face of it, isn’t the person who can keep the 10-section intro class running smoothly the best choice for running the department as a whole?

Somewhere out there is a universe in which people have recognized that education is at least as important as innovation. Somewhere out there, communicating and inspiring is considered just as valuable as filling in another square millimeter in the great coloring book we live in. Somewhere out there—but not here.

Twilight Zone logo

Software Tools in JavaScript: Terms

I still have 48 figures to draw for Software Tools in JavaScript, but to follow up on this post about its topics and this post about using glossaries to summarize lesson content, here’s a list of the terms defined in each chapter. What I want to build (or find) next is a tool that will take data like this, find related uses (“glob” for “globbing”, “method chain” for “method chaining”, etc.), and tell me if I’m using ideas before explaining them. I don’t want to have to list all possible synonyms by hand, any more than I want to have to list all the functions that a module calls. Instead, what I want for lesson maintenance is something that will tell me when I’ve broken something up by adding, cutting, or moving material.

Systems Programming
anonymous function asynchronous Boolean
callback cognitive load command-line argument
console current working directory destructuring assignment
edge case filesystem filter
globbing idiomatic log message
path promise protocol
scope single-threaded string interpolation
Asynchronous Programming
call stack character encoding class
constructor event loop exception
method method chaining non-blocking execution
promisification UTF-8
Unit Testing
actual result assertion caching
defensive programming design pattern dynamic loading
error (test) exception handler expected result
exploratory programming fail (test) fixture
global variable introspection lifecycle
pass (test) side effect Singleton pattern
test runner throw exception unit test
File Backup
collision cryptographic hash function csv
handler hash code hash function
JSON mock object pipe
race condition stream timestamp
time of check/time of useUTC version control system
Data Tables
column major data frame garbage collection
heterogeneous homogeneous immutable
row major SQL tagged data
test harness
Pattern Matching
base class Chain of Responsibility pattern depth-first search
derived class greedy algorithm query selector
regular expression test-driven development
Parsing Expressions
literal precedence token
well-formed YAML
Page Templates
bare object DOM dynamic scoping
environment lexical scoping stack frame
Visitor pattern
Build Manager
automatic variable build manager build recipe
build rule build stale build target
compiled language cycle directed acyclic graph
dependency driver interpreted language
link pattern rule runnable documentation
Template Method pattern topological order
Layout Engine
attribute cache confirmation bias
coupling DOM selector easy mode
layout engine function signature
File Interpolator
header file loader sandbox
search path shell variable
Module Loader
absolute path alias circular dependency
closure directed graph encapsulate
immediately-invoked function expression namespace plugin architecture
Style Checker
abstract syntax tree dynamic lookup generator function
iterator pattern linter Markdown
walk (tree)
Code Generator
byte code code coverage Decorator pattern
macro nested function
Documentation Generator
accumulator block comment doc comment
line comment slug
Module Bundler
entry point module bundler transitive closure
Package Manager
backward-compatible combinatorial explosion heuristic
manifest patch prune
SAT solver semantic versioning
Virtual Machine
Application Binary Interface assembler assembly code
bitwise operation compiler instruction pointer
instruction set label address op code
register virtual machine word (memory)
breakpoint source map

Data Dictionaries

I was helping some friends analyze some data today, and discovered that the ./data directory in the project they had inherited contained a file called manifest.csv that was loaded and echoed in the top of their analysis notebook. I can’t show you what it contained—their data isn’t public—but the equivalent for Allison Horst’s Palmer Penguins dataset would look something like this:

penguins,species,text,NA,false,common name of species
penguins,island,text,NA,false,island where data collected
penguins,bill_length,number,mm,true,bill length (Figure 1)
penguins,bill_depth,number,mm,true,bill depth (Figure 1)
penguins,flipper_length,number,mm,true,flipper length (Figure 2)
penguins,body_mass_g,number,mm,true,bird weight
penguins,sex,text,NA,true,bird sex

It’s easier to see and appreciate laid out like this:

table column type unit na meaning
penguins species text NA false common name of species
penguins island text NA false island where data collected
penguins bill_length number mm true bill length (Figure 1)
penguins bill_depth number mm true bill depth (Figure 1)
penguins flipper_length number mm true flipper length (Figure 2)
penguins body_mass_g number mm true bird weight
penguins sex text NA true bird sex

The table name is included because the manifest.csv I’m imitating described several related data files; one of the column descriptions even said, “Foreign key into other_table/other_name”.

This doesn’t include everything—for example, it doesn’t specify which text fields are enumerations (or factors, if you’re a statistician)—and the figures referred to in the original manifest.csv aren’t anywhere in the project repository—but wouldn’t life be better if every project you worked with came with something like this? Having once spent several days trying to figure out which temperature measurements in a dataset were °C and which were °F, having SI units somewhere discoverable was enough to make me swoon.

How to Write a Memo

One of the most popular talks I give is on how to run a meeting. In it and in my description of Martha’s Rules I talk about writing short memos to summarize proposals so that people know what they’re being asked to support (or equivalently what they’re actually opposing), but I don’t show any examples. Writing these makes meetings fairer: if all discussion is off the cuff then the deck is stacked again people who aren’t extroverts, quick thinkers, on the end of a reliable low-latency connection, and fluent in the language being used in the meeting.

This memo is an edited version of what I would write for something simple; this proposal shows the level of detail I would include if money was needed. Remember, the memo’s point is not to persuade but to summarize, and to make it clear to everyone what they’re actually agreeing to.

Summary: Use verification of Zipf’s Law as a running example through the entire book.


Research Software Engineering with Python is going to introduce readers to a lot of new tools and terminology. Using different problems for the major examples in each chapter will be an extra burden for readers, who will have to learn about seismology, baseball, and whatever else we choose as well as Git, Make, and Python packaging. Using different problems will also be a burden on us, as we will have to write and maintain several different (small) projects or packages.


Use verification of Zipf’s Law as the running example throughout all chapters.

  1. Lots of raw data available (novels from the Gutenberg Project, Wikipedia pages, etc.).

  2. Raw data is messy and cleanup has multiple stages so we can show workflows.

  3. Problem is very easy to describe and only requires a bit of math.

  4. Data is all open license so we can build/share a package without any worries.

Budget and Staffing

  • No budget required.
  • Approx. 2 hours for one person to get and package raw data.


  1. Different chapters use different examples.
    • Pro: less coupling between chapters (can change one without ripple effects on later chapters).
    • Pro: variety is the spice of life (multiple examples might be more compelling).
    • Con: cognitive load (learners have to get up to speed with multiple examples).
    • Con: the more domains our examples come from, the more likely it is that a learner will hit one that’s unfamiliar.


  • 2019-06-06/all: adopted.
  • 2019-05-12/GVW: incorporate revisions from RJ.
  • 2019-05-03/GVW: first draft.