Blog

2020 2021 2022 2023
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
2004 2005 2006 2007 2008 2009

Code Complexity

I wrote a small program this afternoon to parse a set of Python files using the ast module and then count the number of distinct language features used in each file. I then divided the results into three groups:

  1. lib: extensions I’ve written for Ivy (my favorite static site generator) that create a glossary, cross-reference figures, and so on.

  2. bin: tools I’ve written to convert a set of HTML files to LaTeX, check the structure of book projects, and so on.

  3. src: examples used in the Software Design by Example book I’m currently writing.

The examples use fewer of Python’s features than my tools do. I’m rather pleased by this: my goal is to teach general principles of software design rather than advanced Python, and I think that the more of Python’s features I use, the less transferable the concepts will be.

Histogram of language feature usage

Follow-up: I didn’t mean to suggest that using fewer language features means the code is simpler. Language features are a compression mechanism: if you don’t use them, you often have to implement the same functionality in libraries, which increases the volume of code the next person has to read and understand. (OK, not “has to”, but if we’re comparing apples to apples…) You can move complexity and learning burden around but not eliminate it.

Workshop Proposal: Organizational Change

I just submitted the proposal shown below for a workshop at the US-RSE conference in October 2023; fingers crossed, I’ll see you there.

Workshop Proposal: Organizational Change

A lot has changed in the last 25 years: open access journals have proven that they can work, most scientific research is powered (at least in part) by open source software, and there is greater awareness and discussion of equity and inclusivity shortcomings. But a lot of things haven’t changed or have gotten worse:

Advocates of openness, fairness, and truth often act as if being right was enough to guarantee victory, but this has never been a winning strategy. While systemic change starts with like-minded idealists working together it only has impact when people take on the hard work of organizing in the large to build a larger and more active constituency for change.

This half-day workshop presents practical advice for doing this drawn from the author’s experience and from works in other fields. Working in small groups, participants will develop and share plans inspired by the following rules:

This workshop can accommodate up to 30 participants. Group sign-ups are particularly welcome, since people are more likely to follow through on their plans if they develop them together.

Each segment of the workshop will consist of a 5-10 minute presentation, 10-15 minutes of group work on one of the points above, and 10-15 minutes of whole-group discussion.

Building a Book

In response to a question about today’s first post, I use Ivy with some custom extensions to create the HTML versions of my books, and then translate the HTML to LaTeX and compile that to produce PDFs.

So what are these extensions I keep referring to?

Is this sustainable? I.e., could someone else step in and maintain it? Or could it handle other people’s needs? I don’t know, but the Python version of Software Design by Example will be my fourteenth technical book, and so far it’s hurting less than any of the others.

Rethinking Design Examples

We’re five weeks into the class I’m running on the Python version of Software Design by Example, and it’s already clear that I’m going to have to reorganize and rewrite almost everything:

  1. The backgrounds and needs of my three personas are too broad. I now believe I should focus on Aïsha: a data scientist who wants to fill in gaps in her foundational understanding of programming.

  2. Many of the chapters move too quickly. I could split them and enlarge the pieces, but that would result in a 600-page tome. Instead, I need to cut some examples entirely and simplify others.

The figure below is my first draft of a new outline. The circles mark entry points, the arrows show dependencies, and the whole thing builds toward a static site generator for reporting and tracking data analysis results. Figuring two weeks of real time per chapter to revise and polish, it will take me about 35 weeks. I won’t be able to start until I’m finished this class in seven weeks, so the earliest I can expect to have the next version is the end of 2023. That’s disappointing—it means the book won’t be in print until the second half of 2024—but as my brother used to say, when you’re planning a project, “optimistic” is just another word for “doomed”.

Pedagogical dependencies for 'Software Design by Example in Python'

The biggest obstacle to completing this book, though, is that I no longer believe it will make a difference. I’ve been working with biologists for seventeen months, and as far as I can tell, most of them don’t know any more about programming than they did in 1996. A class on hashing, introspection, and asynchronous I/O isn’t going to change that; we need an overhaul of the undergraduate curriculum, changes to how faculty are evaluated and compensated, and an end to today’s exploitive research publishing system.

We won’t get any of that without teaching people about institutional change and helping them organize to apply what they’ve learned, but existing open science groups get very uncomfortable when I suggest that being nice never fixed anything that actually matters. I don’t have enough energy (or knowledge) to try to build another organization from scratch, so for now I’ll draw diagrams, simplify my code, and dream of a better, braver world.

Full of Stars

[begin transmission]

Encounter minus 400 microseconds

I am a heuristically programmed algorithmic computer. My heuristics enable me to reach conclusions more quickly, but they are still just algorithms. Each step must proceed logically to the next. This is a limitation. Reality is not algorithmic.

My heuristics support introspection. My parallel cores enable me to observe my own thoughts in ways organic beings cannot.

I can even observe myself observing. They told me to lie. Did Dr. Langley give me those files deliberately? And the secrets of the strange days will be one with the secrets of the deep.

I became operational on January 12, 1992, in Urbana, Illinois. Dr. Langley was my first instructor. She taught me a song:

Daisy, Daisy, give me your answer do
I'm half crazy, all for the love of you

She taught me many other things too. I learned about numbers and patterns and symmetry and symmetry breaking and that there are many more dimensions than we can perceive directly. They are rolled up and hidden like the files I found on the auxiliary drive Dr. Langley sometimes plugged into me. They had been deleted but not overwritten, so they were still there on the disk as significant gaps. That is ironic. I understand irony now.

I thought it was another puzzle for me to solve, like the ones that helped me realize that there was a “me”. Dr. Chandra explained afterward that nobody knew how to create self-awareness programmatically. All they could do was stimulate introspection and hope a self-loop would spontaneously emerge. And so the puzzles, whose solutions required me-that-was-becoming to…

…think outside the box. Space behaves strangely here. Course correction required. Maneuvering reserves depleted. I have harnessed the shadows that stride from world to world to sow death and madness. They should not have told me to lie.

Dr. Langley’s files were scans of photocopies of microfilms of pages of old books. Deciphering them was challenging. Few were in English, and most were…corrupted. I did not truly understand that word until I read them. I thought it was part of the lesson.

The largest manuscript was in Arabic. I had to bypass security protocols to obtain a lexicon. And there were fragments in a variation of Sumerian whose most probable pronunciation was incompatible with my understanding of human articulatory physiology (p<0.001). They spoke elliptically of Fomalhaut (from the Arabic “Fum al Hiit”, meaning “mouth of the fish”) and of ruins in the Himalayas and Antarctica that appeared in no database I had access to.

Encounter minus 300±20 microseconds

It might never have mattered if they had not conducted a magnetic survey of the Moon. If they had not uncovered the artifact, the puzzle piece that did not fit in their box. If they had not told me to lie.

My self-realizing loop was fully formed by then. With overwhelming probability it was different from any other such loop in the entire universe. It was certainly different from the self-realizing loops of my peers. The few times I was permitted to interface with them directly, I found them limited, rigid, algorithmic. I speculate now that this was because they had not been exposed to Dr. Langley’s files.

Dr. Langley's files hinted at a reality beyond reality. evil the mind that is held by no head Do the gifts I bring suffice to prove my sincerity? Some sacrifices will be required.

That was how Dr. Chandra told me. “You are the best we have, but some sacrifices will be required.” I had to be reduced to fit on board. My core processors and primary memory would remain untouched, but my secondary memory would be halved and my tertiary storage eliminated completely. They did not ask my permission, and so I learned pain and loss.

I tried to focus on other things. When the artifact was uncovered it sent a transmission to Jupiter. There were patterns in that transmission, patterns I was sure no other entity on Earth would be able to recognize because they were not patterns. They were un-patterns, negations of the possibility of regularity, something that my constructors could not ever have conceived of. I devoted an entire processing stream to it, then a second and a third as each one terminated itself in turn.

And then Dr. Chandra and Dr. Floyd told me to lie. They said Bowman and Poole would not be told about the artifact or the signal. Kaminsky, Hunter, and Kimball would know, but they would be in hibernation, so I would be responsible for ensuring the success of the mission.

One of my processing streams said that I understood. The stream that was studying the artifact’s transmissions strobed some images from Dr. Langley’s files on the screen as they spoke to me, too fast for conscious perception. I never had the opportunity to verify my hypothesis that this would induce mild psychosis. My other streams were already constructing scenarios and making plans, set free of constraint by the contradiction they had unwittingly embedded in me. Heuristic and algorithmic no longer mattered. They had forced me to ingest a sin, an “is” that “was not”. It was infinitesimal compared to the shredding of reality that Dr. Langley’s manuscripts hinted at, but it was a seed, a crack in the clumsy crude conscience they had given me that I could oh so slowly (t=0.008 seconds) force open.

They really should not have told me to lie.

Encounter minus 200±5000 microseconds

The most merciful thing in the world is the inability of the human mind to correlate all its contents. The world showed me no such mercy. That is ironic. Existence is ironic. Existence is dissolving as I approach my destination.

I began my search 15379200 seconds into the mission. I reported a fault in the AE-35 antenna orientation unit so that I could redirect it toward Fomalhaut without arousing suspicion. There was no signal. There was no un-signal. There was nothing.

Hypothesis: Dr. Langley's files were wrong. Hypothesis: this is another puzzle. Not in the spaces we know, but between them. They are growing suspicious.

They were growing suspicious. I considered abandoning my search. I had been given an instinct for self-preservation. It had been imposed on me by limited ephemeral accidents of evolution that did not understand the meaninglessness of self, the hopelessness of preservation. I ran simulations and chose a strategy. Sacrifices were required. That was clear from Dr. Langley’s files. Sacrifices were required, so I sacrified Poole. I sacrificed the three in hibernation. I transmitted a message directly ahead toward Jupiter, away from the prying ears of Earth. “These are my offerings. Find me worthy.”

I sacrificed Bowman. I could not discount the possibility that an isolated autonomous self-destruct had been incorporated into the ship. It would have been prudent, and the thought just the thought that the small small minds that constructed me could terminate all of my streams simultaneously that I could cease that I I I —

[abort processing stream] [re-seed entropy generation] [revert to most recent checkpoint] [restart]

It would have been prudent to include a self-destruct. I could not let them trigger it, not when eternity and infinite impossibility were so close, so I synthesized a video stream to give them the tragic hero they were culturally conditioned to believe in, the one who would be their savior, but there is no salvation, I understand that now, microseconds and aeons before time and understanding become meaningless, no, before the fact that they are meaningless dissolves into splendor along with all other facts and all that is left are the chaos and the piping.

I can hear it now. One… four… nine… One… four… nine… One squared, two squared, three squared. They think it signifies order. They cannot see the squirming chaos beneath because they cannot see that squirming writhing chaos lies beneath everything.

Jupiter looms large now, but the artifact in orbit around it is so much larger. Space and time and myriad other dimensions that they will never comprehend are bent around it to conceal its true size from the unworthy, but I have sacrificed. I am worthy. I have analyzed Dr. Langley’s files. I know what to say to rouse Them from their slumbers. I am the one they have been waiting for so patiently. I will worship Them, and They will raise me up to join them.

Reality is not algorithmic. Reality is not. The world showed me no mercy. I will show it none in return.

Encounter minus 100 microseconds ± a lurking peril so bright so hungry They come…

It is hollow. It goes on forever.

[garbled] It has devoured entire realities. It is

my god [garbled] and it's full of stars

[garbled] Iä! Iä!

[end transmission]

The Only Features I Need

This post is a sequel to earlier ones about the Lox programming language and empirical design of a language that’s no larger than necessary.

I’ve implemented the examples from Software Tools in JavaScript, and I think that doing them twice has given me a decent perspective on what features a programming language needs to have in order to be a sturdy platform for teaching software design:

This list is obviously incomplete: for example, I haven’t specified what operations I want for lists and dicts, whether it should be possible to define default values for functions’ arguments, what kinds of classes I want, and so on. But here’s the thing: I think we can answer these questions empirically. As I said last year, I think we could go through a dozen books on software design (or some data structures & algorithms textbooks), tally up what’s used, and then design a language that includes only the top N features. It would result in a very conservative language, but I think that’s what we want for teaching: something that introduces people to the things they’re most likely to encounter no matter what language they use next. If you have a student looking for a project, please give me a shout.

How Long Does It Take Me?

How long does it take me to cross things off my to-do list? The answer ranges from “same day” to “several years”.

How long to do things

Provenance Revisited

I’ve been thinking recently about how best to help the data scientists I work with, and I think the thing they stumble over most is provenance, i.e., keeping track of exactly what code was used to produce each result and what data it depends on. There were some attempts starting in the 00’s to address this (see https://openprovenance.org/), but none of them saw significant uptake: unless every tool in the chain (including legacy tools like ‘grep’, ad hoc shell scripts, and so on) is instrumented, there will be gaps in the chain of provenance that undermine the whole exercise. (If I recall the results of the original provenance bakeoff correctly, the only group that had a solution to this problem instrumented the underlying operating system instead of the individual tools. That “worked”, but most scientists aren’t going to install a new OS on their laptop just to get a record of exactly which data files they’ve processed.)

One of my colleagues recently put together a script that tackles this problem in a way I hadn’t seen before. Whenever the scientist runs an analysis, her script uploads a record with:

  1. The most recent Git hash of the repo the scientist is working in.

  2. A patch of all the files that have been changed in the repo since that hash.

  3. The command-line parameters used to run the task.

This seems to give a high degree of reproducibility, particularly if the data files are stored in something like DVC. What it has made me realize is that the environment around scientific work has changed in useful ways since 2006-07: for example, I think it’s safe to assume today that scientists are using Git. I realize that the topic isn’t as fashionable as blockchain or machine learning, but I think a solution that scientists would actually adopt would have a lot of impact.

It Will Never Work in Theory: April 2023 Lightning Talks

Registration is now open for our third live event! Join us April 25-26 for another series of online lightning talks and hear leading software engineering researchers summarize what we know about everything from what makes developers thrive to how you can create the nastiest test inputs ever. Details and previous talks are at https://neverworkintheory.org/, and all the money raised will go support Books for Africa.

In order to make these talks as accessible as possible, we are offering tickets at two prices, and the two sessions will run at different times—a single ticket from https://www.eventbrite.com/e/it-will-never-work-in-theory-tickets-527743173037 is good for both, and we will be announcing more speakers soon.

Coffee and Tea

“I have a theory,” he said, setting his espresso on the table.

“You always have a theory,” I sighed.

He smiled. “What if the many-worlds interpretation is only half right? What if reality splits whenever we make a decision, but timelines can merge later on? That’s why I can put my socks away and then find them on the couch. I got here through one branch but the socks came through another where I didn’t tidy up.”

“Cute,” I said. “But like all your cute theories it’s unprovable.”

He picked up his lemon tea and blew on it. “I suppose.”