2004 · 2005 · 2006 · 2007 · 2008 · 2009
2010 · 2011 · 2012 · 2013 · 2014 · 2015 · 2016 · 2017 · 2018 · 2019
2020 · 2021

Managing Research Software Projects Workshop

I am running an online workshop on “Managing Research Software Projects” from 10:00-14:00 Toronto time on Sep 29-30 to raise money for MetaDocencia, an inclusive and collaborative community that improves education by empowering instructors from underserved countries. The workshop introduces the ideas and tools you need to manage a team of up to a dozen people working together to build research software. The workshop is intended for new faculty who are setting up their labs, the creators of open source projects that now have other contributors, and everyone else who finds themselves wrangling people and deadlines as well as code. Topics will include:

  • how to run an effective meeting,
  • how to recruit new contributors and help them be productive,
  • how to prioritize work,
  • how to design complex software and ensure its reliability, and
  • how to share what you’ve done.

Tickets and details are available at https://www.eventbrite.com/e/managing-research-software-projects-tickets-169833409299 - I hope to see you there.

If your company would like to sponsor this workshop, the best way is to purchase tickets for people who couldn’t otherwise take part—please contact me for details, and thank you in advance.

Learner Persona

Jo, 31, completed a PhD in geology several years ago and now works for a national laboratory. The fracture modeling software they wrote in grad school is now being used by two dozen research groups around the world, several of which have started contributing fixes and extensions of varying quality. Jo has just been given a post-doc and a junior programmer to expand the code as well, and wants to learn how to decide which pull requests are safe to merge, decide what’s most important to work on next, and handle people who spend more time arguing on Slack than they do writing code. This workshop will show them what a healthy mid-sized project looks like and how to manage both staff and external contributors.

Textbooks (Alone) Are Not Enough

Yesterday I tweeted:

After reviewing three books more-or-less titled “Data Science for Social Scientists”, I think what our field desperately needs is “Social Science for Data Scientists”. I don’t know enough to write it, but I’ll pre-order a bunch of copies…

Someone replied, “…what you’re asking for is a textbook - those exist.” With respect, I disagree. Most people—including most working programmers—won’t read a textbook unless they absolutely have to. Saying that we can fix this problem by writing (or pointing at) a textbook is like saying that abstinence programs are the solution to teen pregnancy: it allows you to claim you’ve solved the problem without threatening anything else you want to believe in.

This response highlights a perspective I’ve struggled against for many years. Brent Gorda and I chose the name Software Carpentry because it wasn’t software “engineering”. We wanted to teach people the equivalent of hanging drywall and fixing leaky taps, not the equivalent of digging the Channel Tunnel. What we learned is that (a) carpentry is more useful to most people than engineering, but (b) skilled trades have lower social status than “gentlemanly” pursuits involving multi-colored algebra on whiteboards.

Academia reacts the same way to popularization: John Galbraith and Carl Sagan were both looked down on by many of their peers for deigning to explain science to non-specialists. Those who do can have tremendous impact, and not always for good. Freakonomics persuaded literally millions of readers that the only valid way to analyze social interactions was through the lens of personal interest. By doing so, it built a constituency for changes in legislation and taxation that have fueled increasing inequality in society.

So if “programmers and data scientists don’t understand how society works” is the problem, I don’t think textbooks are the answer. They can help people who are already convinced they want to know more to keep learning, but “already convinced” is the hard part. For that, we need someone who knows enough to know what corners to cut and what simplifications will not mislead, and who doesn’t think that being comprehensible is somehow shameful. We need someone who can explain to programmers steeped in Silicon Valley’s small-L libertarian zeitgeist why racial discrimination persists even though it’s economically inefficient, how regulatory capture works, why CEOs keep getting away with sexual assault, and why Radical Candor is bullshit in the service of power. If this is you, please give me a shout.

Software Design Rules

My webinar on “Software Design for Data Scientists” raised over $600 for Books for Africa. The key points are summarized below; if you’d like me to give the talk to your company, university, or other organization, I’d be happy to do so in exchange for a donation to BfA or some other mutually-agreed charity.

Rule 0: Computer scientists aren’t taught software design either, so don’t feel like you’ve missed something.

Rule 1: Design for people’s cognitive capacity (7±2, chunking, and all that).

Rule 2: Design toward widely-used abstractions and maximize the ratio of “what’s unique in this statement” to boilerplate, but remember that the tradeoff between abstraction and comprehension depends on how much people already know.

Rule 3: Design for evolution, because the problem, the tools, and your understanding will all change each other. The key tool for this is information hiding; the Liskov Substitution Principle and Design by Contract will help.

Rule 4: Design for testability - not just because you want to test, but because it’s a way to validate designs.

Rule 5: Design as if code was data, because it is. Programs are just text files; code in memory is just another data structure, and taking advantage of this can make designs much more elegant (but also less comprehensible to newcomers).

Rule 6: Design for delivery - organize your code the way your packaging system expects, handle errors instead of printing them and discarding them, and use a proper logging system.

Rule 7: Design graphically (but don’t try to create software blueprints). Flowcharts, informal architecture diagrams, entity-relationship diagrams, and use case maps will all help people understand the overall design.

Rule 8: Design after the fact. The most important thing isn’t to follow any particular process, it’s to look as though you did so that the next person can retrace your steps.

Rule 9: Design with villains in mind, because security, privacy, and fairness can’t be sprinkled on after the fact.

Rule 10: Design collaboratively and inclusively, because it will produce a better design (and because it’s just the right thing to do).

Whatever Happened to TidyBlocks?

TidyBlocks is a Scratch-like tool for doing basic data science. Originally built by Maya Gans, it was overhauled in 2020, after which several volunteers translated its interface into several different (human) languages. We were excited by its potential, but:

  1. We had reached the limits of what the Blockly toolkit could do without some serious extension work. (In particular, there’s no comprehensible way to represent joins using the available styles of blocks.)

  2. Nobody was willing to fund further development. The overhaul in 2020 took about 300-400 hours of volunteer time; while I would have liked to continue, I didn’t see a way forward without fixing #1 above, and that couldn’t be done without financial support.

I still think the idea is a good one: the user testing we did showed that the interface is immediately comprehensible to anyone who has used Scratch (which these days means most middle school kids and their teachers), and after watching my daughter plod through their school’s “data literacy” module, I think we need something better. I hope someone, some day, will find a way to make it happen.

I Hope They Would Have Liked It

My mum would have been 94 today, and my sister would have been 57. Mum knit every day for more than 80 years and Sylvia collected toy mice, so I got this. I suspect both would have disapproved but secretly been pleased.

Knitting Mouse tattoo

What Everyone in Tech Should Know About Teaching and Learning

I have just posted a 40-minute video of my talk “What Everyone In Tech Should Know About Teaching and Learning”. It’s a quick tour of the most popular material from Teaching Tech Together, which in turn is based on training I originally developed at the Carpentries and for RStudio’s instructor training and certification program. The slides are available under a Creative Commons license, and I’m always happy to deliver it as a lunch-and-learn. I hope you find it useful—feedback is always welcome.

Software Engineering's Greatest Hits

I have just posted a 30-minute video of my talk “Software Engineering’s Greatest Hits”. In brief: software engineering researchers have learned a lot over the last 50 years, but most working programmers don’t even know that knowledge exists. I think the way to close that knowledge gap is to teach a bit of data science to undergraduate computer scientists so that they’ll understand what claims are actually being made, and then tell them what we currently think we know. To make this work, I think we have to teach data science using software engineering data and examples—there are lots of good generic data science courses out there, but most people learn best and fastest when the examples are directly relevant to their own domain. A course like this would fit into the curriculum and be culturally defensible (“Look, math!”) and I think it would also be very popular with students (“Look, data science!”). And if I’ve learned anything in the last 20 years, it’s that simply presenting the results bounces off: if it was going to work, it would have by now. I hope you’ll enjoy the talk - comments and feedback are very welcome.

Related:

Beneath Coriandel

My novel Beneath Coriandel is now for sale on Amazon in Canada, the UK, the US, and elsewhere. I hope you enjoy reading it as much as I enjoyed writing it.

Cover of 'Beneath Coriandel'

A young man descends into the tombs beneath the city to slay a monster, a woman plots to steal her niece’s youth, a ghost explains how he earned his name, and a magician wonders why she can’t get an old nursery rhyme out of her head. With spies, swordplay, betrayal, forbidden love, and a philosophically inclined pair of boots, Beneath Coriandel will appeal to anyone who enjoyed The Curse of Chalion, The Lies of Locke Lamora, or The Innkeeper’s Song.

In the wake of posts about Shopify's support for white nationalists and DataCamp's attempts to cover up sexual harassment
I have had to disable comments on this blog. Please email me if you'd like to get in touch.