I am grateful to Prof. Alberto Bacchelli
for inviting me to give a colloquium at the University of Zurich
a couple of days ago.
My talk was titled Cocaine and Conway’s Law,
but what it was really about was how to teach young software developers about cognitive pollution.
When most existing software engineering courses talk about risk,
they talk about single-point incidents like the Therac-25
or Ariane 5 flight V88
where there is a direct cause-and-effect relationship between a mistake in software and something bad happening.
I think we should instead talk about things like
tetraethyl lead
and asbestos,
or about Purdue Pharma’s role in the opioid epidemic
and the fossil fuels lobby’s campaign
to promote climate change denial.
In each of these cases,
deliberate choices increased the general level of risk to hundreds of millions of people,
but the statistical nature of that risk
allowed those responsible to avoid personal accountability.
I think that case studies like these will help learners understand things like
the role that Meta’s algorithms played in the Rohingya genocide
and think more clearly about scenarios like the one below:
It is 2035.
Most men under 20 have learned what they “know” about what women like
from AI chatbots trained on porn and tuned to maximize engagement.
As a result,
many of them believe that a frightened “no” actually means “yes please”.
The people who have earned billions from these systems
cannot legally be held accountable for their users’ actions.
How does that make you feel?
If you are currently teaching an undergraduate course that covers cases like these,
please get in touch:
I’d be grateful for a chance to learn from you.
Please note that I am suffering from jet lag and recovering from a bad cold while writing this,
which means my proposal may well be garbage.
I’ve had a lot of conversations over the years about
the differences in how software engineers and data scientists work.
One example is how they manage software:
Software engineers regard duplicated code as sinful and refactor to avoid it.
Data scientists routinely copy a notebook or a script and make small changes to do a new data analysis.
After many years,
I have accepted that they are right to do so:
their analyses are often exploratory one-offs,
so copy-and-modify is more efficient than generalize-and-parameterize.
The problem is that software engineers build tools for software engineers,
which means they don’t automatically support data scientists’ workflows.
Continuing the refactor-versus-copy example,
Git doesn’t have a way to explicitly say “this file started as a copy of that one”.
Git has a way to say “this file was moved or renamed” (git mv),
but there isn’t a corresponding git cp command
because software engineers believe that you shouldn’t be doing that.
You can ask Git to guess which files were copied in each commit:
git log --find-copies --diff-filter=C --stat
but (a) you probably didn’t know this existed,
(b) you’re not going to remember it,
and (c) Git’s heuristics often produce incorrect answers.
So let’s add git cp so that the log records copying events explicitly.
That will allow us to trace the lineage of copied-and-modified notebooks and scripts
(and the copied-and-modified configuration files that software engineers create
because they don’t think of YAML and TOML as code).
Doing this won’t solve all our traceability problems,
but I think it will solve some of them,
and we’ll learn something useful from its failure if it doesn’t.
My daughter gave me a set of story dice for Christmas a couple of years ago.
One of them has gone missing,
which makes me wonder if there are now stories I’m no longer able to tell.
Time to make another cup of tea. If you came in peace, be welcome.
My daughter was profligate in her use of kitchenware:
she would always grab a fresh glass for water instead of re-using the one that she had just emptied,
and never used three saucepans to cook when she could use five.
Now that she has left home,
I only need to load the dishwasher once a day.
But I wish I still had to load it twice.
Time to make another cup of tea. If you came in peace, be welcome.
I recently received mail from someone working on a software-based approach to fault tolerance.
Their tool makes applications more reliable,
but they think it also makes developers more productive
by reducing the amount of error detection and handling code they need write.
They have never been able to find research
that quantifies how much time developers spend on code for detecting and handling problems
relative to the effort for the “happy path”.
they know it’s substantial,
and is (probably) increasing as applications become more distributed,
but the only number they’ve found is from a 1995 book called
Software Fault Tolerance,
where Dr. Flaviu Cristian says that it often accounts for more than two-thirds of code in production systems.
So I asked a dozen researchers I met through It Will Never Work in Theory
if they knew of anything,
and the answer was, “No, there isn’t anything that specifically addresses that question.”
This strikes me as odd,
because it wouldn’t be hard to measure
and the answer would be interesting.
People do throw around questionable numbers about the cost of bugs and bug fixing,
e.g., claim that companies $2 trillion in 2020.
Here are some other related resources my contacts were able to give me:
Today Was a Good Day: The Daily Life of Software Developers:
Developers spend about 11% of their time on debugging and bugfixing
with some days being dedicated to the task (up to 32%)
and some days being dedicated to meetings and collaboration (4-6%).
You can also add time spent on testing (up to 16%).
Again,
the fact that we don’t have reliable figures for this strikes me as odd.
As one of them pointed out,
while everyone is throwing LLMs at often artificial and academic problems
and then claiming to have improved some arbitrary metric X% over a random baseline,
we still don’t know fairly basic things about software development.
My thanks to everyone who responded to my late-night email about this.
Later:
this post made the #8 spot on Hacker News.
It must have been a slow day…
For every beginning there must be an ending,
but we don’t like to talk about that,
particularly not in the tech industry.
There are thousands of books in print about how to start a business,
but only a handful about how to pass one on,
and many of those are really about how to sell out at the right time.
I have experienced a lot of endings,
and the most important thing I’ve learned is that
they can be dignified and fulfilling if done well.
I also think that preparing for the end can make it less likely,
and make what happens before it more enjoyable.
However,
a lot of people aren’t being given the chance
to wind things down gracefully.
Between the Trump administration’s attack on science
and the cuts big tech companies are making in the name of AI,
thousands of people are being given days (or less) to end years of work.
I am therefore assembling material for a half-day workshop on project closure.
If you or someone you know has ended a software project or scientific research project,
I’d be very grateful if you could spare half an hour for an online interview:
you can reach me by email at gvwilson@third-bit.com.
Note: all discussion will be confidential,
and everyone interviewed will be able to review and veto anything that mentions them
before it is seen by anyone else.
Learner Personas
There are important differences between deliberate closure
(shutting a project down of your own accord and on your own timeline),
and abrupt closure
(shutting it down on short notice under difficult circumstances).
This workshop therefore caters to two learner personas.
Vaida
Vaida, 33, has a PhD in oceanography
and now works as a data analyst for the Ministry of the Environment.
She has been collecting and publishing beach erosion data for the past six years.
She also co-founded a volunteer group that teaches environmental science to high school students,
and has been its leader for the past five years.
Vaida is relocating to pursue a new career opportunity,
so she wants to wind down her data collection project.
She also wants the volunteer group to continue its work,
but the only documentation of how it operates is
one slide deck and a couple of out-of-date blog posts.
Vaida is working hard to prepare for her new job,
which means only has two or three hours a week for the next couple of months
to put into tidying things up.
Liam
Liam, 41, worked as a civil engineer for almost a decade
before becoming a full-time software developer
at a company that does contract work
modeling slope stability for large construction projects.
While Liam writes lots of tests and uses Git and GitHub to share work with his colleagues,
very little of what he knows about OpenStabil
(the company’s open source software package)
has ever been written down.
Liam’s group was acquired by another engineering firm sixteen months ago.
After an abrupt change of leadership,
the company has decided to merge parts of OpenStabil into a closed-source tool suite
and to stop all further development of the open version.
Liam has been told to make these changes immediately;
after protest,
he has been given until the end of the week.
Liam is deeply invested in the small but tight-knit OpenStabil community,
but has a young family at home
and doesn’t dare risk being unemployed.
I was going to title this post “Two Great Tastes That Taste Great Together”,
but I expect most of my readers are too young to get the reference,
so I’ll just dive right in:
Glitch gave literally millions of people a chance to build something on the web
without having to wrestle with NPM or webpack
or set up a server
or deal with any of the other crap that Sumana Harihareswara has dubbed
inessential weirdness.
It was beautiful and useful,
but it wasn’t profitable enough for Fastly to keep it alive.
But the idea of a low-overhead in-the-browser way for the 99% to build things
didn’t start with Glitch
and hasn’t died with it either.
Projects like Webbly (source here)
are still trying to let people use the web to build the web.
However,
someone has to host these things somewhere:
who’s going to do that, and where?
More specifically,
can we construct a hosting solution that isn’t tied to a particular company
and therefore doesn’t have a singular point of failure?
Well,
what about Mastodon?
Its authors and users are deeply committed to decentralization and federation,
and more people are running servers for particular communities every day.
What if (wait, hear me out)
what if Webbly was bundled with Mastodon
so that Mastodon site admins could provide an in-the-browser page-building experience to their users
simply by saying “yes” to one configuration option?
Why would they do that?
My answer is,
“Take a look at Mastodon’s default browser interface.”
It lets you add a couple of pictures and a few links to your profile,
but that’s less than MySpace offered twenty years ago.
I am 100% certain that if Mastodon came with an easy in-browser page builder,
people would use it to create all sorts of wonderful things.
(Awful ones too, of course, but Mastodon site admins already have to grapple with content admin.)
Greenspun’s Tenth Rule is that every sufficiently complicated program
contains a mediocre implementation of Lisp.
Equally,
I think every useful web-based tool is trying to be
what Visual Basic was in the 1990s
and WordPress was to the early web:
useful, right there, and a gradual ramp for new users rather than a cliff to climb.
I think the sort of people who built useful little things with Glitch
would do amazing things with Webbly
if it was married to their social media.
I also think that allowing people to create custom home pages
or tweak their feeds
would draw a lot of new users away from fragile, centralized systems like X and Bluesky.
I know that I’ve been wrong far more often than I’ve been right,
but this really does feel promising.