The first is that there’s been no interest in the book among people who teach software design and software engineering.
I taught a course called “Software Architecture” several times at the University of Toronto in the early 2000s;
while lots of textbooks had those two words in their title,
none of them spent more than a few pages describing the designs of actual systems.
My frustration with that led to Beautiful Code
and The Architecture of Open Source Applications,
but as popular as they were among working programmers,
only a handful of teachers used them in class.
The chapters were written at very different levels and used many different programming languages,
which meant they required far more background knowledge than most undergraduates had.
STJS fixes that by tackling problems at more or less the same level,
in a domain that software engineering students will be familiar with (programming tools),
and by using a single language that most potential readers will already have encountered.
Despite all that, and being free,
none of the software engineering faculty I’ve reached out to have shown interest in using it.
But that’s the lesser of my disappointments.
(Like all authors, I’ve become very good at cataloguing and ranking disappointments…)
I never wanted to write this book myself:
my plan was to write a few chapters to get the ball rolling
and then invite people who aren’t yet well known, but should be,
to contribute a chapter each.
When I look at Beautiful Code now,
what strikes me is how homogeneous the contributors were:
while a quarter of the people we reached out to were women,
35 of the final 36 contributors were men
(and almost all of them were white).
AOSA wasn’t quite as bad,
but I wanted STJS to be better
because I want computer science and the tech industry to be better.
Because here’s the thing:
you’re not reading this because I program—you’re reading this because I write.
Thanks to a few lucky accidents
(like my father being a high school English teacher)
I can put words together better and faster than most people.
Whatever clout I have is a result of that,
not my middling-good ability to code in C or Python,
and I figured that if it worked for me by accident
then maybe it could be made to work for other people by design.
When I started work on STJS three years ago,
my plan was to invite people from groups our industry has marginalized or excluded
to write chapters,
and to turn those chapters into conference talks,
to help raise their profiles.
Of the people I contacted,
only three agreed to,
and none of them delivered.
I understand why not:
if you’re not a straight white or Asian male
then you already have to do a lot of extra work to get ahead in tech.
Doing even more work on the off chance that it will lead to something else
is a bad investment,
and being asked to do extra work without pay—again—is just wearying.
I’m pleased with the book itself:
there are still some formatting glitches,
some of the diagrams could be clearer,
but it’s not bad for a first release.
If I could wind the clock back to 2018, though,
I wouldn’t have started it.
Designing things well matters to me—thinking and talking about it
reminds me of my brother—but
fixing our broken industry matters more.
I still hope to see it adopted in courses,
and I still think a second volume with contributions from people
who deserve to be on stage just as much as I ever did
might make a difference,
but looking at it now,
I think I could have helped more people with those hours
by doing something else.
Footnote: I received two messages within minutes of publishing this post.
The first told me not to be so down on myself
to which I replied that
I need to decide whether to put in a hundred hours to finish
Building Software Together;
if I don’t reflect on this project,
I’m unlikely to do better with the next.
The second told me, “Don’t be such a bleeding heart.
If diversty [sic] hires don’t want to work hard to get ahead they don’t deserve to.”
I’m willing to bet those “diversity hires” have worked harder
than the person who wrote that message has ever had to;
I’m also willing to bet that he’ll never let himself see that.
I recently helped a group of about fifteen people set up a new research software engineering project
(where by “new” I mean “restart something that was in bits and pieces scattered across half the internet”).
They all had GitHub accounts already,
and a couple of them had read Research Software Engineering with Python,
but only one had any formal training as a programmer
(a 12-week bootcamp four years ago).
Here’s what we did in order—I’d be grateful for suggestions about what we missed
or what you would reprioritize.
Create a mailing list for the project.
The team voted 2:1 for email over Slack because they want better search and fewer interruptions.
Create a new GitHub organization for the project and add everyone to it.
So that nothing belonging to the project is under a personal account.
Create a new repo within that GitHub organization.
Everything is in one repo for now, but that might change.
Redefine the tags in that repo.
Governance: discussion (including questions) and proposal (for votable items).
Issues: bug and request.
Pull requests: fix, enhancement, docs, and refactor.
Meta: paused, helped wanted, good first issue.
Add README, LICENSE, CODE_OF_CONDUCT, GOVERNANCE, Makefile, and .gitignore to the repo.
We settled on Make because nobody could agree on what to use instead.
Create two pip requirements files:
requirements.txt is a minimal setup for using the software.
development.txt sources that and adds everything needed for building, testing, and documenting.
Create socks, docs, and tests directories in the root of the repo along with a setup.py file.
Pretty standard structure for a pip-installable Python package (and no, “socks” isn’t its real name).
Set up pytest for running tests and pdoc for building documentation.
tests/conftest.py for pytest.
A docstring in every __init__.py file (rather than leaving it empty) to make pdoc happy.
For this year’s National Novel Writing Month
I revised a YA novel called Maddy Roo
about a teenage kangaroo whose little sister is kidnapped by robots.
You can read the whole thing as
(for most e-book readers),
I hope you enjoy it;
feedback would be greatly appreciated.
I recently ran a workshop on managing research software projects,
and one of the questions that came up was,
“What does ‘done’ look like?”
KThere are lots of answers elsewhere for the technical side
but what about project management and governance?
Here’s a first cut of the artifacts used to support those activities
for a project with up to a couple of dozen contributors;
additions, deletions, and corrections would be very welcome.
(As always, please email me:
the last time I opened up comments on this site
it took all of two days for the trolls to show up.)
A shared Google Drive with a doc called “Roles and Responsibilities”
Google Doc because some collaborators aren’t comfortable with Git
And to make it easier to paste in figures and screenshots
Defines roles and explains what each is responsible for in one page
Each role has a doc of its own with its checklists
The same shared Google Drive has one doc per year called (e.g.) “Progress 2022”
Section headings are weekly meeting dates
Table for each week with columns Name, Progress, Plans, and Problems (bullet points)
Anything too long to fit comfortably in the table is linked to an issue in the project’s GitHub repository
Project has a little script that lists issues and PRs touched by each person (reminder)
Weekly hour-long status meeting (which often finishes early)
On Wednesday so that people aren’t scrambling on Friday or a weekend (or holiday Monday) to write status updates
Rotating moderator: last week’s moderator is this week’s note-taker
Before meeting, members star points in the status doc they want to discuss
Moderator draws up agenda based on starred items
Proposals can be done as either Google Docs (in shared folder) or GitHub issues
Must be flagged to moderator the day before the meeting for inclusion
Added to agenda
Project has a single repo with code, website, tutorials, etc.
So that releases are in sync
Uses Google Docs (again) for publicity materials (because non-programmers)
All materials are owned by project account, not personal accounts
Every change larger than a typo produces a new doc
Every doc has date in title, e.g., “University Press Release 2022-05-13”
Budgets for grant proposals, job contracts, etc., are stored in university system
And membership rule:
anyone who has had a PR merged in the last year or made some other significant contribution (as determined by the PI)
List of active members and alumni is in the foot of GOVERNANCE.md
Another small script checks that the tags in each project repository are consistent
and that each issue has at least one tag
Project website has a “skills ladder” on the “Positions” page (even when positions aren’t open)
“What we mean by each of these terms for the research and coding tracks”
Project website has a value statement and a contact address that isn’t anyone’s personal address
Plus a page for publications
Plus a page pointing at all repositories
Plus a “Getting Started” page
And a “Who’s Using Us How” page
And a “People” page
The “help” option for the software includes the URL to the project page
A Note on Consensus
As Jo Freeman pointed out in “The Tyranny of Structurelessness”, every group has a power structure;
the only question is whether it is explicit and accountable, or implicit and unaccountable.
GitHub’s Minimum Viable Governance guidelines duck this issue:
2.1. Consensus-Based Decision Making
Projects make decisions through consensus of the Maintainers.
While explicit agreement of all Maintainers is preferred, it is not required for consensus.
Rather, the Maintainers will determine consensus based on their good faith consideration of a number of factors,
including the dominant view of the Contributors and nature of support and objections.
The Maintainers will document evidence of consensus in accordance with these requirements.
since people’s idea of what constitutes “good faith” varies widely,
“consensus-based” means governance by the self-confident, stubborn, and well-connected,
which marginalizes a lot of people.
Martha’s Rules and other procedures
for putting proposals forward and voting on them aren’t perfect—democracy never is—but
I’m starting a new job as a software engineer with Deep Genomics next week.
I’m looking forward to writing Python for a living again after a decade of doing other things,
but hope I’ll be able to wrap up these side projects soon as well:
I let fly with some half-baked complaints about the state of document compilers on Twitter yesterday,
so I’d like to try to get some more organized thoughts down before I’m distracted again.
By now we should all be using WYSIWYG tools.
We don’t because version control tools refuse to diff and merge them.
I’ve ranted about this before;
I no longer believe it’s going to be fixed in my working lifetime,
so I’ll move on.
Jamstack’s list of open source static site generators (SSGs) currently has
over 300 entries.
Most of them are designed with blogging in mind,
which means they don’t meet a lot of other authorial needs out of the box:
Numbering chapters, sections, and subsections consecutively across files (e.g., across chapters).
Numbering figures, tables, examples, exercises, and everything else an author might want.
(No matter what counters you provide, people are going to need another one—for example,
did you notice that “theorems” wasn’t in the previous sentence’s list?)
Not requiring document names in cross-references,
because content often moves between files.
Not requiring manual numbering (e.g., an order number or weight in each chapter) because ditto.
Handling bibliographic citations, glossary references, index references, and a bunch of other things
without requiring a lot of typing.
Most SSGs are extensible if you speak the language (more on this below),
but many insist on a page-at-a-time processing model
so that (for example) consecutive sequential numbering of figures across rather than within chapters
simply isn’t doable without external processing.
There are layers on top of SSGs that handle some of these things,
but in my experience they’re very fragile.
(Go ahead, try to figure out which of Bookdown’s several configuration files you need to modify
to change the way pages are numbered.)
A large part of that fragility comes from reliance on LaTeX and/or Pandoc.
These are both powerful tools,
but like FORTRAN,
the startup costs for casual users are prohibitive
and the number of expert users is slowly but steadily dwindling.
(Try to get the LaTeX templates of any of the major publishers to work nicely with the SSG of your choice
and tell me how long it took you.
Now go to someone who hasn’t used LaTeX as long as you have and see how long it takes them.)
“Everybody should use the right tool for the job” isn’t a solution for the people I want to help,
any more than “everybody should use the right programming language for the job,”
because most people don’t have the free time I had in my twenties to master obscure technologies.
If you don’t agree,
we’re probably thinking about different audiences.
One thing yesterday’s Twitter exchange helped me realize is that
I think user-level in-tool extensibility is a must-have.
For all its quirks,
most people can build the customizations they want for LaTeX in the tool itself.
If you want to extend Pandoc you have to write in—well,
you get to pick,
which means that someone else who wants to use your extension has to install that language’s toolchain.
You also have to work at the parse tree level rather than by slinging bits of text around;
I recognize that the former is more general,
but so is assembly code.
At this point I’d like to put forward a proposal that solves all these problems at once,
but I don’t have one.
“Simple things are easy and hard things can be approached gradually without switching paradigms”
is what every tool builder aspires to,
but that doesn’t mean it’s always achievable.
I think that LaTeX-style text splicing is enough for a lot of common cases,
but a Turing-complete extension language is needed for more complex things,
and that language should be one that people use anyway instead of (for example)
the bastardized Ruby that things like Jekyll provide.
and liked them,
but they don’t provide a simpler mechanism equivalent to LaTeX’s \newcommand
with a couple of string parameters.
And of course my perspective is heavily biased by my background
and I might completely misunderstand the problems that most people face.
If anyone knows of a comparative usability study of different document compilers
(something more than just one person’s drive-by based on misreading those tools’ home pages),
I’d be grateful for a pointer.
Not long ago I interviewed for a community manager position
with a company that’s pretty well known in open source.
They opened by asking me to write paragraph-length answers to some fairly innocuous questions,
but in the second round,
they asked me to do an IQ test and a personality profile test.
Both of these were managed by a third party and done online;
after dithering for a few days
I decided to put my reservations aside and get on with it.
That’s when I discovered that the first page of each test
required me to enter my name,
my email address—and choose my gender,
with “Male” and “Female” being the only options.
At this point my reservations broke out of the basement I’d locked them in
and began clamoring for attention.
As far as I could tell,
the IQ test consisted of number-matching problems;
I’m not going to believe that scores in that need to be adjusted by gender
without seeing (and checking) the data.
any company in the HR business today should know better than to restrict gender choices
to a binary “M or F”.
So I mailed the recruiter,
who connected me with the head of HR,
who told me that
they were using tests to eliminate unconscious bias in their hiring process.
When I asked if they had any evidence showing a correlation between
test scores and on-the-job performance,
the HR manager said no,
and that such evidence would be very hard to come by.
It took me less than a minute of searching online
to find companies that would give me
an unlimited number of practice sessions
for both the IQ and personality profile tests for a mere £75.
One of these sites hinted very strongly that for a negotiable fee
they’d be happy to provide “exam assistance”,
which meant “we’ll write the exam for you” back when I was a prof.
My conclusion was that the company I’d applied to
was more concerned with appearances than with eliminating bias,
so I withdrew my application.
I’m very privileged in being able to turn down jobs I don’t want.
You might not be able to do that—you
might have to set aside your queasiness and do the tests
even though your gut tells you their flaws are dangerous—but
I want you to know that your instincts are right.
And if you are able to say “no” when you’re in a situation like this,
please do so,
because every time someone senior says “yes”,
it’s that much harder for someone junior to refuse.
As I announced a few days ago,
I am running an online workshop on “Managing Research Software Projects” on September 29-30
to raise money for MetaDocencia.
Tickets are available on Eventbrite
and the curriculum and schedule are taking shape:
I’d be very grateful for feedback on what’s missing
and what you think doesn’t need to be covered.
Who you are, who's teaching this workshop, what we're going to cover, and how to participate.
How to ensure meetings are short, productive, and fair.
Figuring out who gets to decide what and how to tell when a decision has been made.
How to organize project collateral, make work findable, reproducible, and shareable.
How to attract new contributors and help them feel welcome and be productive.
Guidelines for creating software that is maintainable and reliable.
Managing who does what and when.
Things you need to do now that you're in charge.
What we actually know and why we believe it's true.
Because sometimes the only way to fix a problem is to fix the institution.
We may also delve into a few bonus topics:
Making research software robust enough to be run by anyone, anywhere.
Building good working relationships between academia and industry.
Making work more findable.
The basics of personal digital safety (because having a higher profile may attract the wrong kind of attention).