I have written four technical books (with three more in progress) and have edited seven others.
They have all been different, but I do now have something approximating a process.
Draw a concept map for major topics (about 20 nodes and 40 links).
Start writing point-form notes for each chapter.
Topics map to chapters about 2:1.
Leave FIXME markers where figures are needed but don’t draw them yet.
Write notes and code for 3-4 hours/week over 12 months (on average—it’s very bursty).
Write all example code while drafting the chapter.
Do a lot of rearranging at this stage, e.g., introduce sub-topic X as part of main topic Y.
Add some topics/chapters at this point:
“I need to explain Z in order for X and Y to make sense and it doesn’t fit an existing chapter.”
Write 8-10 exercises for each chapter and revise the point-form notes so that this is possible.
Do some more rearranging at this stage.
More importantly, cut material because it just doesn’t fit this book.
Turn the point-form notes into prose.
One hour a day for 5 weeks turned 19 chapters of point-form notes (about 4 pages per chapter) into finished prose.
The finished prose is anywhere from 140% to 250% the length of the original notes.
Draw the diagrams.
About 90% of the intended diagrams survive; I don’t think I added any at this stage, but I should have—I hate drawing diagrams.
One entire chapter was cut at this point because the examples didn’t work and the content didn’t really fit anyway.
At this point I have 385 printed pages.
Based on previous books it will grow by 125% to 150% based on feedback
as I explain things that were clear to me but aren’t to anyone else.
One of my favorite Twilight Zone episodes is
“But Can She Type?”,
in which the protagonist finds herself in a parallel universe
where secretaries are treated the way rock stars are in ours.
I think about it every time I stub my mind on questions like:
Why doesn’t Canada have a Natural Sciences and Engineering Learning Council
to funding teaching the way NSERC funds research?
Why doesn’t the US have a National Learning Foundation on par with the NSF?
I’ve used CTAN (the Comprehensive TeX Archive Network),
CPAN (for Perl),
CRAN (for R),
and PyPI (Python’s equivalent—the letter ‘P’ was already taken).
Why isn’t there a Comprehensive Learning Archive Network (CLAN)?
I recognize that the Reusability Paradox
would make the lessons in CLAN less immediately useful than the libraries in CRAN,
but we could still do a lot to make lessons more discoverable.
So far as I’ve been able to determine,
no computer science department at a major Canadian university
has ever had a member of its teaching faculty as chair or head.
On the face of it,
isn’t the person who can keep the 10-section intro class running smoothly
the best choice for running the department as a whole?
Somewhere out there is a universe in which
people have recognized that education is at least as important as innovation.
Somewhere out there,
communicating and inspiring is considered just as valuable as
filling in another square millimeter in the great coloring book we live in.
Somewhere out there—but not here.
but to follow up on this post about its topics
and this post about using glossaries to summarize lesson content,
here’s a list of the terms defined in each chapter.
What I want to build (or find) next is a tool that will take data like this,
find related uses (“glob” for “globbing”, “method chain” for “method chaining”, etc.),
and tell me if I’m using ideas before explaining them.
I don’t want to have to list all possible synonyms by hand,
any more than I want to have to list all the functions that a module calls.
what I want for lesson maintenance is something that will tell me when I’ve broken something up
by adding, cutting, or moving material.
I was helping some friends analyze some data today,
and discovered that the ./data directory in the project they had inherited
contained a file called manifest.csv
that was loaded and echoed in the top of their analysis notebook.
I can’t show you what it contained—their data isn’t public—but
the equivalent for Allison Horst’s Palmer Penguins dataset
would look something like this:
penguins,species,text,NA,false,common name of species
penguins,island,text,NA,false,island where data collected
penguins,bill_length,number,mm,true,bill length (Figure 1)
penguins,bill_depth,number,mm,true,bill depth (Figure 1)
penguins,flipper_length,number,mm,true,flipper length (Figure 2)
It’s easier to see and appreciate laid out like this:
common name of species
island where data collected
bill length (Figure 1)
bill depth (Figure 1)
flipper length (Figure 2)
The table name is included because
the manifest.csv I’m imitating described several related data files;
one of the column descriptions even said,
“Foreign key into other_table/other_name”.
This doesn’t include everything—for example,
it doesn’t specify which text fields are enumerations (or factors, if you’re a statistician)—and
the figures referred to in the original manifest.csv aren’t anywhere in the project repository—but
wouldn’t life be better if every project you worked with came with something like this?
Having once spent several days trying to figure out
which temperature measurements in a dataset were °C and which were °F,
having SI units somewhere discoverable
was enough to make me swoon.
One of the most popular talks I give is on how to run a meeting.
In it and in my description of Martha’s Rules
I talk about writing short memos to summarize proposals so that people know what they’re being asked to support
(or equivalently what they’re actually opposing),
but I don’t show any examples.
Writing these makes meetings fairer:
if all discussion is off the cuff
then the deck is stacked again people who aren’t extroverts,
on the end of a reliable low-latency connection,
and fluent in the language being used in the meeting.
This memo is an edited version of what I would write for something simple;
shows the level of detail I would include if money was needed.
the memo’s point is not to persuade but to summarize,
and to make it clear to everyone what they’re actually agreeing to.
Summary: Use verification of Zipf’s Law as a running example through the entire book.
Research Software Engineering with Python is going to introduce readers to a lot of new tools and terminology.
Using different problems for the major examples in each chapter will be an extra burden for readers,
who will have to learn about seismology, baseball, and whatever else we choose
as well as Git, Make, and Python packaging.
Using different problems will also be a burden on us,
as we will have to write and maintain several different (small) projects or packages.
Use verification of Zipf’s Law as the running example throughout all chapters.
Lots of raw data available (novels from the Gutenberg Project, Wikipedia pages, etc.).
Raw data is messy and cleanup has multiple stages so we can show workflows.
Problem is very easy to describe and only requires a bit of math.
Data is all open license so we can build/share a package without any worries.
Budget and Staffing
No budget required.
Approx. 2 hours for one person to get and package raw data.
Different chapters use different examples.
Pro: less coupling between chapters (can change one without ripple effects on later chapters).
Pro: variety is the spice of life (multiple examples might be more compelling).
Con: cognitive load (learners have to get up to speed with multiple examples).
Con: the more domains our examples come from, the more likely it is that a learner will hit one that’s unfamiliar.