Our Process

Posted

Back in 2022 I wrote that there are two ways of looking at roles in a small-to-medium tech company: what people do, and what risks they address. Coincidentally, I had just started working with someone who has turned out to be the best product manager I’ve ever met. In the two years since she joined the company she has overseen lots of improvements in our software engineering processes (partly, I suspect, to make it easier for her to do her job properly). She recently summarized the state of play to orient a new hire; what follows is a summary of her summary with a few explanatory notes that I hope will be useful to other software developers working in research-intensive organizations.

I believe the most important part of what follows is how she has organized her thinking. Each process exists to answer key questions, has its own frequency, is someone’s responsibility, and generates specific products. If you have a description like this of how your team operates, please get in touch: I think we could learn a lot from comparisons.

Company Goals and Priority Projects

This activity happens up in the stratosphere. As an engineering manager I may occasionally be called on for input, but for the most part this is the ground for everything else described below.

Engineering Team

Our team includes about a dozen people and is responsible for half that number of applications, ranging from complex C++ libraries wrapped up in Python to three-tier web applications for managing experimental data and command-line tools to support routine tasks. The activities below are done by the whole team.

Engineering OKRs

The VP of Engineering defines objectives and key results at the start of each quarter based on discussions with senior management, the leaders of other teams, and members of our team. Goals are prioritized as P0 (not doing this will block something critical to the company), P1 (we need this, but we can scale it back or it can slip if a P0 overruns), and P2 (it would be nice to have).

Note: we used to define thirty or more tasks each quarter rather than a handful of OKRs. The theory was that if we’re meeting all of our targets, our targets weren’t ambitious enough. We now realize that setting ourselves up to fall short time and again isn’t actually helpful.

Engineering OKR Reviews

Improvement in anything requires reflective practice. Our end-of-quarter OKR reviews therefore have two purposes: to make sure everyone understands where we actually are, and to summarize what we’ve learned so that we can do better next time.

Engineering Quarterly Retrospective

Similarly, we have a general retrospective at the end of each quarter to ask ourselves wider-ranging questions that may not be specific to a particular project’s goals. Practices like having an on-call user support rotation have grown out of these self-reviews.

Note: our product manager has been doubling as our scrum master for the last couple of years. Those jobs are usually done by different people in larger organizations, so the description above lists the role rather than the person who happens to be doing it right now.

Engineering All Hands

Our company includes biologists working in a wet lab, data scientists, and machine learning specialists staring into the void and hoping it blinks first. And while most of us are in Toronto, a substantial minority live and/or work remotely. This meeting is where the VP of Engineering updates us all on things we might otherwise miss (both organizationally and technically). We also host guest speakers from other teams in the company so that we can can keep abreast of what they’re doing, have AMA (ask me anything) meetings with senior management, or do a deep dive into new tools or processes we might want to adopt.

Engineering Design Reviews

One of the things that has come out of our end-of-quarter reviews is a greater emphasis on presenting designs for new features or applications to the whole team before diving into implementation. The goal here is not to create a pile of UML diagrams that no one will ever look at again, but rather to explain what the problem actually is, what factors a solution must take into account, what alternatives there are, and what we’re not going to do. Some designs go through two or three overhauls before work starts; in almost every case, checking our thinking like this has saved us the pain and delay of re-thinking, re-architecting, and re-writing halfway through implementation.

A recent change is to update the design doc a few weeks after work starts to explain where and why reality has diverged from the plan. Doing this extends the useful life of that document, but more importantly, it’s a chance to reflect on what the code is teaching us and share that with other people so that our next design(s) can be better.

Weekly Friday Engineering Demos

One of our more experienced team members instituted Friday afternoon demo sessions last year, and we have all become believers. Some demos show off a new feature; others walk the audience through a bug that has been fixed or introduce the team to useful third-party tools. Demos runs anywhere from a few minutes to half an hour, with most being near the low end of that range. It’s a fun way to wrap up the week and remind ourselves that we’re actually making progress.

User Support

As mentioned earlier, we now take week-long turns monitoring the #eng-support channel and responding to users’ problems. In some cases (e.g., when it’s my turn to do support) the response usually takes the form of, “OK, let me pull in someone who actually understands what’s going on here,” but whether the on-call engineer handles the problem themself or hands it off, it gives us all regular reminders of how our software is being used and what state it is actually in.

The “issues” product deserves a bit more explanation. The on-call engineer creates a new issue in our issue tracker (we use Linear) for every incident so that there is a single point of reference for the conversation. Doing this saves people from having to hunt through GitHub, a wiki, and various Slack channels and/or Google docs when there’s a hand-off or the issue occurs again.

Fixit Sprints

We work in two-weeks sprints, and for the last year and a half, we’ve dedicated one sprint each quarter to paying down technical debt. If we need to clean up the data access layer in an application or migrate to a backward-incompatible version of a key library, we put that work in “our” sprint so that it won’t disrupt work on user-facing changes. The benefits are hard to quantify but very real, and making this a regular activity makes the time easier to defend.

Team Activities

Everyone on the engineering team is involved in the activities described above. In contrast, the activities below are undertaken by specific people or groups of people.

Problem Discovery

If you’ve never had the pleasure of working with a good product manager, you may not realize how much better your life could be. This activity and the one that follows are the core of what they do: question, observe, and listen to users, figure out what their actual needs are, and translate those needs into things for engineers to fix or build.

Solution Discovery

This activity is where product management and engineering design overlap. Its output is the starting point for the design docs mentioned earlier: in particular, this is where needs, constraints, and big-picture alternatives are formulated.

Engineering Needs

Software engineers are people too, and have their own specialized needs. Our product manager occasionally treats us as users and uses discovery techniques to (help us) figure out what would make our lives better.

Biweekly Sprint Goals/Backlog Grooming

OK, we’ve agreed on what we’re going to build—now we actually have to build it. Every two weeks, the product manager and the project’s technical lead create issues for new work and (if necessary) relabel old issues to define what we’re going to try to accomplish in this sprint. Most of the new issues come from the design docs, or are elaborations or breakdowns of things discussed in those docs; depending on the size of the work, we may create several sub-issues under a single high-level issue so that progress reports aren’t “I’m working on it” several days in a row.

I’ve worked on teams where issues had a “percent complete” field that the people responsible for it were supposed to update daily. It wasn’t helpful: it’s easy to see how much ditch you have left to dig, but much harder to know how much code you have left to write and debug. Breaking milestones down into inch-pebbles and marking the latter as “finished” or “not finished” has proven much more useful.

Separately, it’s worth highlighting again that a big part of this process is making sure that the people who are doing the work understand why they’re doing it so that they can make small judgment calls on their own and know when to come back and ask for guidance. You can tell a team has reached this point if most people, most of the time, would make the same decision about something minor on their own, and if they would mostly agree on when discussion is needed.

Daily Standups

We don’t actually stand up, but this is still the most important meeting each team has. The tech lead of the project does a quick lap around the team to find out what each person has done, what they’re planning to do, and what’s holding them back. We try to enforce a three-sentence rule: if a discussion during individual updates goes on longer than that we defer it until after everyone has finished their initial report.

Sprint Retros

The suggestions coming out of sprint retros tend to be smaller than those coming out of quarterly retros, but big things do show up sometimes.

Feature/Slice Deployment

Customer Feedback and Metrics