Blog

2020 2021 2022 2023 2024 2025 2026
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
2004 2005 2006 2007 2008 2009

Distributed Systems Design by Example

I started work on Distributed Systems Design by Example exactly 212 days ago. The first draft is now done, and I’ve learned a lot while writing it, just as I learned a lot by writing books on software design in JavaScript and Python. However, I don’t know if I will ever finish this book: people don’t seem to read long-form technical writing any longer, and there several pieces of fiction that I want to get over the finish line. Still, I hope what’s there is useful.

Cognitive Pollution

A couple of weeks ago, I used Claude to vibe code three formative assessment widgets to use in Jupyter and Marimo notebooks. It took less than two hours to get them working, and another 15 minutes to build a fourth. Given how rusty my JavaScript is, and how little I know about the AnyWidget protocol, I believe it would have taken at least a couple of frustrating days to write them by hand. I only have a high-level understanding of how they work (mumble mumble traitlets mumble mumble), but since they are exploratory proofs of concept, I’ve told myself that doesn’t really matter.

Two weeks from now, I will fly from Toronto to London, then to Edinburgh. By doing so, I will be responsible for the emission of approximately one tonne of CO2. That won’t do any measurable harm on its own, but when combined with the emissions of several billion other people, it will make the world my daughter inherits poorer and more dangerous in countless ways. I know that, but I’m still going to get on the plane.

Here’s another analogy. Synthetic opioids have destroyed the lives of hundreds of thousands of people, and while low-level dealers are routinely incarcerated, none of the Sackler family has ever faced serious consequences. On the other hand, the only thing that made the last few months of my brother’s life bearable was a steady drip of those same drugs. I would fight hard against them being banned, but I would also fight hard against them being completely deregulated.

Here’s a third thought. My brother died of mesothelioma, a cancer that is caused by exposure to asbestos. We grew up in a logging town on Vancouver Island; he did cleanup work at the local sawmill as a teenager, but it took decades for the cancer to manifest. I expect it will similarly take years or decades for us to discover the effect chat bots tuned to maximize engagement have had on what young men believe about what women enjoy.

What ties all of this together for me is:

  1. AI is useful.

  2. It is already causing harm.

  3. Saying “just don’t use it” isn’t going to have any more effect than saying “just don’t fly” (or preaching abstinence to teenagers).

  4. The people driving the AI goldrush have proven that they don’t care about anything except adulation and profit.

We now recognize the ill effects of the cognitive pollution caused by social media. I believe current attempts to address them via age verification are naïve; I think it would be more effective to regulate or ban the use of algorithmic ranking based on personal data, but the truth is that I don’t know. I don’t know enough about how safety standards became normal for the chemical, pharmaceutical, food, and transportation industries to feel that my opinions about regulating AI are worth listening to. What I do know is that people have devoted their careers to studying these things, and would probably be willing to explain them to us if we asked.

Looking back, I’m very glad that I took the time to learn a bit about evidence-based pedagogy before telling other people how they should teach. I therefore think that before we make recommendations about what anyone ought to do about AI, we ought to find out what has and hasn’t worked elsewhere. I’ve been thinking about this for a long time, but I still don’t know how to make it happen.

104 Days

It has been 104 days since I was laid off. In that time I have written approximately 64,000 words, of which 75% has been fiction and 25% non-fiction. (These figures don’t include email or social media.) I’ve actually written on all but 30 of those 104 days; at 71%, that puts me a little short of my 75% target but slightly ahead of the 65% of days I’ve managed over the past year.

As for time, I’m averaging about 5 hours a day of trackable activity, which includes exercise, music practice, and pro bono work as well as writing, programming, teaching, and looking for a job. I don’t really know where the rest of the day goes—I don’t believe sleep, chores, Wordle, and an episode or two of Elementary fill nineteen hours out of every twenty-four—but I’m trying not to worry about it.

Ongoing projects include:

One thing I haven’t done much of is read. I used to devour a book or two a week, but these days I find it difficult to get into most fiction, and even harder to read non-fiction. I don’t know if this is because I’m distracted by personal and world events, or whether it’s a stage of life, but I miss losing myself in someone else’s prose for a few hours at a time.

Four Traditions Revisited

Tedre and Sutinen’s paper “Three Traditions of Computing: What Educators Should Know” has shaped my thinking ever since I first read it. And this table (reproduced from the paper) summarizes their analysis:

Mathematical tradition Engineering tradition Scientific tradition
Assumptions Programs (algorithms) are abstract objects, they are correct or incorrect, as well as more or less efficient – knowledge is a priori Programs (processes) affect the world, they are more or less effective and reliable – knowledge is a posteriori Programs can model information processes, models are more or less accurate – knowledge is a posteriori
Aims Coherent theoretical structures and systems Investigating and explaining phenomena, solving problems Constructing useful, efficient, and reliable systems; solving problems
Strengths Rigorous, results are certain, utilized in other traditions Combines deduction and induction, cumulative Able to work under great uncertainty, flexible, progress is tangible
Weaknesses Incommensurability of results, uncertainty about what counts as proper science Limited to axiomatic systems Rarely follows rigid, preordained procedures; poor generalizability
Methods Empirical, inductive, and deductive Analytic, deductive (and inductive) Empirical, constructive

As I wrote three years ago, I’m struck now by what’s not there. I think there should be a fourth column titled “Humanist tradition” that focuses on values, on how computing is used, and on how cognitive and social psychology support, shape, and limit what we can build and how we build it.

I also now think that their distinction between the engineering and scientific traditions isn’t particularly useful. In practice, they are nearly-identical attempts to turn software development into an engineering discipline on par with chemical or electrical engineering. UML, requirements engineering, the use of statistical models to predict bug rates: all are signs of “engineering envy”, and by and large, practitioners have voted with their feet and not adopted them.

Instead, the overwhelming majority of the programmers I’ve worked with fall into what I used to call a “craft” tradition, but which I now think has a lot more in common with industrial design. Using Tedre and Sutinen’s categories:

I think this analysis explains why practitioners and software engineering researchers mostly talk past one another. Most researchers subscribe to what Scott’s book Seeing Like a State labelled “high modernism”: they believe comprehensibility and control will come from uniformity and formalism. Practitioners, on the other hand, are defending the local traditions in which they are personally invested. In my idle moments, I wonder where we’d be if that long-ago NATO conference had adopted industrial design as a metaphor instead of engineering.

Updating Snailz

I have updated the synthetic data generator I built last year to generate datasets I can use in my SQL tutorial. I might also use it as a running example if I ever teach a course on software design in Python to researchers.

If Not Lessons, Then What?

I used to think that when I retired, I would spend my time writing short tutorials on topics I was interested in as a way to learn more about them myself. I’ve now been unemployed for three months, and while I’ve written some odds and ends, it’s not nearly as fulfilling as I expected because I know that most people aren’t going to read a three-thousand word exposition of discrete event simulation: they’re going to ask an LLM, and get something pseudo-personalized in return.

To be clear, I don’t think this is inherently a bad thing: ChatGPT and Claude have helped me build https://github.com/gvwilson/asimpy and fix bugs in https://github.com/gvwilson/sim, and I believe I’ve learned more, and more quickly, from interacting with them than I would on my own. But they do make me feel a bit like a typesetter who suddenly finds the world is full of laser printers and WYSIWYG authoring tools.

I believe I can write a better explanation than an LLM, but (a) I can only write one, not a dozen or a hundred with slight variations to address specific learners’ questions or desires, and (b) it takes me days to do somewhat better what an LLM can do in minutes. I believe I go off the rails less often than an LLM (though some of my former learners may disagree), but is what I produce better enough to outweigh the speed and personalization that LLMs offer? If not, what do I do instead?

First-of in asimpy

Adding a “first of” operation to asimpy required a pretty substantial redesign. The project’s home page describes what I wound up with; I think it works, but it is now so complicated that I’d be surprised if subtle bugs weren’t lurking in its corners. If you (or one of your grad students) want to try using formal verification tools on ~500 lines of Python, please give me a shout.

Trying to Understand asimpy

As a follow-on to yesterday’s post, I’m trying to figure out why the code in the tracing-sleeper branch of https://github.com/gvwilson/asimpy actually works. The files that actually matter for the moment are:

I’ve added lots of print statements to sleep.py and the three files in the package that it relies on. To run the code:

$ git clone git@github.com:gvwilson/asimpy
$ cd asimpy
$ uv venv
$ source .venv/bin/activate
$ uv sync
$ python examples/sleep.py

Inside src/asimpy/actions.py there’s a class called BaseAction that the framework uses as the base of all awaitable objects. When a process does something like sleep, or try to get something from a queue, or anything else that requires synchronization, it creates an instance of a class derived from BaseAction (such as the _Sleep class defined in src/asimpy/environment.py).

Now, if I understand the protocol correctly, when Python encounters ‘await obj’, it does the equivalent of:

iter = obj.__await__()  # get an iterator
try:
    value = next(iter)  # run to the first yield
except StopIteration as e:
    value = e.value     # get the result 

After stripping out docs, typing, and print statements, BaseAction’s implementation of __await__() is just:

def __await__(self):
    yield self
    return None

Looking at the printed output, both lines are always executed, and I don’t understand why. Inside Environment.run(), the awaitable is advanced by calling:

awaited = proc._coro.send(None)

(where proc is the object derived from Process and proc._coro is the iterator created by invoking the process’s async run() method). My mental model is that value should be set to self because that’s what the first line of __await__() yields; I don’t understand why execution ever proceeds after that, but my print statements show that it does.

And I know execution must proceed because (for example) BaseQueue.get() in src/asimpy/queue.py successfully returns an object from the queue. This happens in the second line of that file’s _Get.__await__(), and the more I think about this the more confused I get.

I created this code by imitating what’s in SimPy, reasoning through what I could, and asking ChatGPT how to fix a couple of errors late at night. It did all make sense at one point, but as I try to write the tutorial to explain it to others, I realize I’m on shaky ground. ChatGPT’s explanations aren’t helping; if I find something or someone that does, I’ll update this blog post.