2004 · 2005 · 2006 · 2007 · 2008 · 2009
2010 · 2011 · 2012 · 2013 · 2014 · 2015 · 2016 · 2017 · 2018 · 2019
2020 · 2021

Not What I Set Out to Do

Software Tools in JavaScript is now available on Leanpub, and printed copies will soon be available as well. I’m glad it’s finally out, and I hope you find it useful, but it has fallen short of my original hopes in two different ways.

The first is that there’s been no interest in the book among people who teach software design and software engineering. I taught a course called “Software Architecture” several times at the University of Toronto in the early 2000s; while lots of textbooks had those two words in their title, none of them spent more than a few pages describing the designs of actual systems. My frustration with that led to Beautiful Code and The Architecture of Open Source Applications, but as popular as they were among working programmers, only a handful of teachers used them in class. The chapters were written at very different levels and used many different programming languages, which meant they required far more background knowledge than most undergraduates had.

STJS fixes that by tackling problems at more or less the same level, in a domain that software engineering students will be familiar with (programming tools), and by using a single language that most potential readers will already have encountered. Despite all that, and being free, none of the software engineering faculty I’ve reached out to have shown interest in using it.

But that’s the lesser of my disappointments. (Like all authors, I’ve become very good at cataloguing and ranking disappointments…) I never wanted to write this book myself: instead, my plan was to write a few chapters to get the ball rolling and then invite people who aren’t yet well known, but should be, to contribute a chapter each. When I look at Beautiful Code now, what strikes me is how homogeneous the contributors were: while a quarter of the people we reached out to were women, 35 of the final 36 contributors were men (and almost all of them were white). AOSA wasn’t quite as bad, but I wanted STJS to be better because I want computer science and the tech industry to be better.

Because here’s the thing: you’re not reading this because I program—you’re reading this because I write. Thanks to a few lucky accidents (like my father being a high school English teacher) I can put words together better and faster than most people. Whatever clout I have is a result of that, not my middling-good ability to code in C or Python, and I figured that if it worked for me by accident then maybe it could be made to work for other people by design.

When I started work on STJS three years ago, my plan was to invite people from groups our industry has marginalized or excluded to write chapters, and to turn those chapters into conference talks, to help raise their profiles. Of the people I contacted, though, only three agreed to, and none of them delivered. I understand why not: if you’re not a straight white or Asian male then you already have to do a lot of extra work to get ahead in tech. Doing even more work on the off chance that it will lead to something else is a bad investment, and being asked to do extra work without pay—again—is just wearying.

I’m pleased with the book itself: there are still some formatting glitches, some of the diagrams could be clearer, and I still don’t think I’ve explained JavaScript promises well, but it’s not bad for a first release. If I could wind the clock back to 2018, though, I wouldn’t have started it. Designing things well matters to me—thinking and talking about it reminds me of my brother—but fixing our broken industry matters more. I still hope to see it adopted in courses, and I still think a second volume with contributions from people who deserve to be on stage just as much as I ever did might make a difference, but looking at it now, I think I could have helped more people with those hours by doing something else.

Footnote: I received two messages within minutes of publishing this post. The first told me not to be so down on myself to which I replied that I need to decide whether to put in a hundred hours to finish Building Software Together; if I don’t reflect on this project, I’m unlikely to do better with the next.

The second told me, “Don’t be such a bleeding heart. If diversty [sic] hires don’t want to work hard to get ahead they don’t deserve to.” I’m willing to bet those “diversity hires” have worked harder than the person who wrote that message has ever had to; I’m also willing to bet that he’ll never let himself see that.

Setting Up a New Project

I recently helped a group of about fifteen people set up a new research software engineering project (where by “new” I mean “restart something that was in bits and pieces scattered across half the internet”). They all had GitHub accounts already, and a couple of them had read Research Software Engineering with Python, but only one had any formal training as a programmer (a 12-week bootcamp four years ago). Here’s what we did in order—I’d be grateful for suggestions about what we missed or what you would reprioritize.

  1. Create a mailing list for the project.
    • The team voted 2:1 for email over Slack because they want better search and fewer interruptions.
  2. Create a new GitHub organization for the project and add everyone to it.
    • So that nothing belonging to the project is under a personal account.
  3. Create a new repo within that GitHub organization.
    • Everything is in one repo for now, but that might change.
  4. Redefine the tags in that repo.
    • Governance: discussion (including questions) and proposal (for votable items).
    • Issues: bug and request.
    • Pull requests: fix, enhancement, docs, and refactor.
    • Meta: paused, helped wanted, good first issue.
  5. Add README, LICENSE, CODE_OF_CONDUCT, GOVERNANCE, Makefile, and .gitignore to the repo.
    • We settled on Make because nobody could agree on what to use instead.
  6. Create two pip requirements files:
    • requirements.txt is a minimal setup for using the software.
    • development.txt sources that and adds everything needed for building, testing, and documenting.
  7. Create socks, docs, and tests directories in the root of the repo along with a setup.py file.
    • Pretty standard structure for a pip-installable Python package (and no, “socks” isn’t its real name).
  8. Set up pytest for running tests and pdoc for building documentation.
    • tests/conftest.py for pytest.
    • A docstring in every __init__.py file (rather than leaving it empty) to make pdoc happy.
    • Use Google-style docstrings.
  9. Add targets to Makefile to:
    • Build the package.
    • Run the tests.
    • Run the tests with coverage and display the coverage report (to identify untested code).
    • Rebuild the documentation.
    • Run flake8, black, and isort to check that the code meets style guidelines.
  10. Add a workflow.yml file with pre-commit checks.
  11. Add a script that uses Jinja2 to turn hand-written documentation into HTML.
    • The team has Markdown design notes and the beginnings of a tutorial that they want to put beside the pdoc docs.
    • And a glossary.md file in glosario format.
  12. Add a data directory with sample data for testing and a couple of real datasets.
    • Each dataset is in its own subdirectory with a MANIFEST.yml file describing files, columns, provenance, etc.
  13. Add a CITATION.cff file with citation information.
    • And make sure every contributor has an ORCID.
  14. Add a bin directory at the top level with various utility scripts.
    • Most of which use the code in the socks directory directly (rather than through a local install of the package).
  15. Add a results directory at the top level with one sub-directory for each paper the team intends to produce.
    • Each sub-directory under results has its own Makefile.
    • make all in the project sub-directory regenerates everything.
    • We haven’t added a cookiecutter template yet, but we will.
  16. Add another Jinja2 script to convert CSV results files into HTML pages.
  17. Add a static directory with some CSS and JavaScript files.
    • Because everyone wants their HTML tables to be dynamically sortable…
  18. Add a BibTeX file to the root results directory to be used by all project papers.
  19. Write a short code review checklist.
    • How to run pre-commit checks, how and why to use the logging library, what exceptions to use for what, etc.
  20. Add a small utility script for loading program configurations.
    • In order: system config, personal config, project config, config file specified on the command line, command-line flags.
  21. Choose a project logo.
    • Which made discussion of build tools look calm and rational…

Maddy Roo

For this year’s National Novel Writing Month I revised a YA novel called Maddy Roo about a teenage kangaroo whose little sister is kidnapped by robots. You can read the whole thing as HTML, epub (for most e-book readers), or mobi (for Kindle). I hope you enjoy it; feedback would be greatly appreciated.

What (a subset of) Done Looks Like

I recently ran a workshop on managing research software projects, and one of the questions that came up was, “What does ‘done’ look like?” KThere are lots of answers elsewhere for the technical side [1, 2, 3], but what about project management and governance? Here’s a first cut of the artifacts used to support those activities for a project with up to a couple of dozen contributors; additions, deletions, and corrections would be very welcome. (As always, please email me: the last time I opened up comments on this site it took all of two days for the trolls to show up.)

  1. A shared Google Drive with a doc called “Roles and Responsibilities”
    • Google Doc because some collaborators aren’t comfortable with Git
      • And to make it easier to paste in figures and screenshots
    • Defines roles and explains what each is responsible for in one page
    • Each role has a doc of its own with its checklists
  2. The same shared Google Drive has one doc per year called (e.g.) “Progress 2022”
    • Section headings are weekly meeting dates
      • Table for each week with columns Name, Progress, Plans, and Problems (bullet points)
      • Anything too long to fit comfortably in the table is linked to an issue in the project’s GitHub repository
    • Project has a little script that lists issues and PRs touched by each person (reminder)
  3. Weekly hour-long status meeting (which often finishes early)
    • On Wednesday so that people aren’t scrambling on Friday or a weekend (or holiday Monday) to write status updates
    • Rotating moderator: last week’s moderator is this week’s note-taker
    • Before meeting, members star points in the status doc they want to discuss
    • Moderator draws up agenda based on starred items
  4. Proposals can be done as either Google Docs (in shared folder) or GitHub issues
    • Must be flagged to moderator the day before the meeting for inclusion
    • Added to agenda
  5. Project has a single repo with code, website, tutorials, etc.
    • So that releases are in sync
  6. Uses Google Docs (again) for publicity materials (because non-programmers)
    • All materials are owned by project account, not personal accounts
    • Every change larger than a typo produces a new doc
    • Every doc has date in title, e.g., “University Press Release 2022-05-13”
  7. Budgets for grant proposals, job contracts, etc., are stored in university system
    • Legal requirement
  8. GOVERNANCE.md in root directory of project explains Martha’s Rules
    • And membership rule: anyone who has had a PR merged in the last year or made some other significant contribution (as determined by the PI)
    • List of active members and alumni is in the foot of GOVERNANCE.md
  9. Another small script checks that the tags in each project repository are consistent and that each issue has at least one tag
  10. Project website has a “skills ladder” on the “Positions” page (even when positions aren’t open)
    • “What we mean by each of these terms for the research and coding tracks”
  11. Project website has a value statement and a contact address that isn’t anyone’s personal address
    • Plus a page for publications
    • Plus a page pointing at all repositories
    • Plus a “Getting Started” page
    • And a “Who’s Using Us How” page
    • And a “People” page
  12. The “help” option for the software includes the URL to the project page

A Note on Consensus

As Jo Freeman pointed out in “The Tyranny of Structurelessness”, every group has a power structure; the only question is whether it is explicit and accountable, or implicit and unaccountable. Unfortunately, GitHub’s Minimum Viable Governance guidelines duck this issue:

2.1. Consensus-Based Decision Making

Projects make decisions through consensus of the Maintainers. While explicit agreement of all Maintainers is preferred, it is not required for consensus. Rather, the Maintainers will determine consensus based on their good faith consideration of a number of factors, including the dominant view of the Contributors and nature of support and objections. The Maintainers will document evidence of consensus in accordance with these requirements.

In practice, since people’s idea of what constitutes “good faith” varies widely, “consensus-based” means governance by the self-confident, stubborn, and well-connected, which marginalizes a lot of people. Martha’s Rules and other procedures for putting proposals forward and voting on them aren’t perfect—democracy never is—but they’re better.

Three Weeks Off

I’m starting work as a software engineer with Deep Genomics tomorrow; here’s what I’ve done in the three weeks since I left Metabase:

  • Ran a workshop on Managing Research Software Projects to raise money for MetaDocencia.

  • Wrote 43 posts on empirical software engineering research for It Will Never Work in Theory.

  • Started acting as client and product owner for some projects based on Software Tools in JavaScript for Mike Hoye’s software engineering course at the University of Toronto.

  • Starting playing the clarinet (because tenosynovitis in my right hand meant I had to give up guitar).

  • Revised a middle-grade novel titled Maddy Roo (furries versus robots with a bit of family drama)—if you’d like to give it a read, I’d be grateful for feedback.

  • Started co-authoring a paper on how to make scientific publications more accessible.

  • Started co-authoring an update to “Software Carpentry: Lessons Learned”.

Current Project List

I’m starting a new job as a software engineer with Deep Genomics next week. I’m looking forward to writing Python for a living again after a decade of doing other things, but hope I’ll be able to wrap up these side projects soon as well:

More Thoughts on Document Compilers

I let fly with some half-baked complaints about the state of document compilers on Twitter yesterday, so I’d like to try to get some more organized thoughts down before I’m distracted again.

  1. By now we should all be using WYSIWYG tools. We don’t because version control tools refuse to diff and merge them. I’ve ranted about this before; I no longer believe it’s going to be fixed in my working lifetime, so I’ll move on.

  2. Jamstack’s list of open source static site generators (SSGs) currently has over 300 entries. Most of them are designed with blogging in mind, which means they don’t meet a lot of other authorial needs out of the box:

    • Numbering chapters, sections, and subsections consecutively across files (e.g., across chapters).
    • Numbering figures, tables, examples, exercises, and everything else an author might want. (No matter what counters you provide, people are going to need another one—for example, did you notice that “theorems” wasn’t in the previous sentence’s list?)
    • Not requiring document names in cross-references, because content often moves between files.
    • Not requiring manual numbering (e.g., an order number or weight in each chapter) because ditto.
    • Handling bibliographic citations, glossary references, index references, and a bunch of other things without requiring a lot of typing.

    Most SSGs are extensible if you speak the language (more on this below), but many insist on a page-at-a-time processing model so that (for example) consecutive sequential numbering of figures across rather than within chapters simply isn’t doable without external processing.

  3. There are layers on top of SSGs that handle some of these things, but in my experience they’re very fragile. (Go ahead, try to figure out which of Bookdown’s several configuration files you need to modify to change the way pages are numbered.) A large part of that fragility comes from reliance on LaTeX and/or Pandoc. These are both powerful tools, but like FORTRAN, the startup costs for casual users are prohibitive and the number of expert users is slowly but steadily dwindling. (Try to get the LaTeX templates of any of the major publishers to work nicely with the SSG of your choice and tell me how long it took you. Now go to someone who hasn’t used LaTeX as long as you have and see how long it takes them.)

  4. “Everybody should use the right tool for the job” isn’t a solution for the people I want to help, any more than “everybody should use the right programming language for the job,” because most people don’t have the free time I had in my twenties to master obscure technologies. If you don’t agree, we’re probably thinking about different audiences.

One thing yesterday’s Twitter exchange helped me realize is that I think user-level in-tool extensibility is a must-have. For all its quirks, most people can build the customizations they want for LaTeX in the tool itself. If you want to extend Pandoc you have to write in—well, you get to pick, which means that someone else who wants to use your extension has to install that language’s toolchain. (Have fun.) You also have to work at the parse tree level rather than by slinging bits of text around; I recognize that the former is more general, but so is assembly code.

At this point I’d like to put forward a proposal that solves all these problems at once, but I don’t have one. “Simple things are easy and hard things can be approached gradually without switching paradigms” is what every tool builder aspires to, but that doesn’t mean it’s always achievable. I think that LaTeX-style text splicing is enough for a lot of common cases, but a Turing-complete extension language is needed for more complex things, and that language should be one that people use anyway instead of (for example) the bastardized Ruby that things like Jekyll provide. I’ve played with some SSGs that use JavaScript as the extension language, and liked them, but they don’t provide a simpler mechanism equivalent to LaTeX’s \newcommand with a couple of string parameters.

And of course my perspective is heavily biased by my background and I might completely misunderstand the problems that most people face. If anyone knows of a comparative usability study of different document compilers (something more than just one person’s drive-by based on misreading those tools’ home pages), I’d be grateful for a pointer.

IQ and Personality Tests

Not long ago I interviewed for a community manager position with a company that’s pretty well known in open source. They opened by asking me to write paragraph-length answers to some fairly innocuous questions, but in the second round, they asked me to do an IQ test and a personality profile test.

Both of these were managed by a third party and done online; after dithering for a few days I decided to put my reservations aside and get on with it. That’s when I discovered that the first page of each test required me to enter my name, my email address—and choose my gender, with “Male” and “Female” being the only options.

At this point my reservations broke out of the basement I’d locked them in and began clamoring for attention. As far as I could tell, the IQ test consisted of number-matching problems; I’m not going to believe that scores in that need to be adjusted by gender without seeing (and checking) the data. Second, any company in the HR business today should know better than to restrict gender choices to a binary “M or F”.

So I mailed the recruiter, who connected me with the head of HR, who told me that they were using tests to eliminate unconscious bias in their hiring process. When I asked if they had any evidence showing a correlation between test scores and on-the-job performance, the HR manager said no, and that such evidence would be very hard to come by.

It took me less than a minute of searching online to find companies that would give me an unlimited number of practice sessions for both the IQ and personality profile tests for a mere £75. One of these sites hinted very strongly that for a negotiable fee they’d be happy to provide “exam assistance”, which meant “we’ll write the exam for you” back when I was a prof. My conclusion was that the company I’d applied to was more concerned with appearances than with eliminating bias, so I withdrew my application.

I’m very privileged in being able to turn down jobs I don’t want. You might not be able to do that—you might have to set aside your queasiness and do the tests even though your gut tells you their flaws are dangerous—but I want you to know that your instincts are right. And if you are able to say “no” when you’re in a situation like this, please do so, because every time someone senior says “yes”, it’s that much harder for someone junior to refuse.

Managing Research Software Projects Curriculum

As I announced a few days ago, I am running an online workshop on “Managing Research Software Projects” on September 29-30 to raise money for MetaDocencia. Tickets are available on Eventbrite and the curriculum and schedule are taking shape: I’d be very grateful for feedback on what’s missing and what you think doesn’t need to be covered.

Day 1

Start Subject Summary
10:00 Introduction Who you are, who's teaching this workshop, what we're going to cover, and how to participate.
10:15 Meetings How to ensure meetings are short, productive, and fair.
10:55 Governance Figuring out who gets to decide what and how to tell when a decision has been made.
11:25 Break
11:40 Mechanics How to organize project collateral, make work findable, reproducible, and shareable.
13:00 Break
13:10 Newcomers How to attract new contributors and help them feel welcome and be productive.
14:00 Finish

Day 2

Start Subject Summary
10:00 Design Guidelines for creating software that is maintainable and reliable.
11:00 Workflow Managing who does what and when.
11:30 Leadership Things you need to do now that you're in charge.
12:00 Break
12:15 Software Engineering What we actually know and why we believe it's true.
13:15 Break
13:25 Change Because sometimes the only way to fix a problem is to fix the institution.
14:00 Finish

We may also delve into a few bonus topics:

  • Making research software robust enough to be run by anyone, anywhere.
  • Building good working relationships between academia and industry.
  • Making work more findable.
  • The basics of personal digital safety (because having a higher profile may attract the wrong kind of attention).
  • What to do if you’ve been fired.
  • How to wrap up and move on when the time comes.

In the wake of posts about Shopify's support for white nationalists and DataCamp's attempts to cover up sexual harassment
I have had to disable comments on this blog. Please email me if you'd like to get in touch.