2004 · 2005 · 2006 · 2007 · 2008 · 2009
2010 · 2011 · 2012 · 2013 · 2014 · 2015 · 2016 · 2017 · 2018 · 2019
2020 · 2021 · 2022

Empirically Minimal

I’m wrapping up the JavaScript version of Software Design by Example and hope to finish the Python version early next year. Writing these has convinced me that it’s harder to teach with languages like Python or JavaScript than it used to be. These languages have features and libraries that make industrial-strength coding more productive, but that richness is bewildering for newcomers: every online search for “how do I XYZ?” throws them into a sea of concepts, terms, and APIs that they haven’t yet met.

I’m therefore spending a lot of time thinking about minimal languages that are novice-friendly. Hedy is explicitly layered; Quorum’s insistence on testing new features’ usability keeps it small; Lua’s focus on embeddability has the same effect, and materialized thought experiments like Wren are even closer to what I want.

But what exactly do I want? Looking at SDXJS and its Python sibling, and at my programming lessons for scientists, I rely on the following:

  • Atomic types: Boolean, number, text, null
    • JavaScript has convinced me that we don’t need to distinguish integers from reals for novices
    • I think null and NA are interchangeable for this audience as well
    • But I do distinguish Boolean from 1/0 even though the latter is perfectly serviceable
  • Collections: list, set, key-value map, multi-dimensional array, dataframe
    • I prefer sets to maps-without-values for the same reason that I prefer Booleans to 1’s and 0’s
    • Yes, you can implement dataframes and multi-dim arrays using lists and maps, but so many examples are so much easier if they’re first-class citizens
    • Weirdly, I don’t feel the same way about graphs even though I use them pretty frequently
  • Control flow: while loop, iterators, if/else-if/else, with, try-catch
    • By “iterators” I mean “a ‘for’ loop that gives you each value in a collection in order”. This implies some way to extend the set of collections in the language (see below).
    • Syntactic support for resource management (e.g., Python’s with statement) has had a surprisingly big impact on my lessons. Again, it’s most useful if it’s extensible.
    • try-catch has always felt like a clumsy case of “what else are we going to do?” but, well, what else are we going to do? (Please don’t say “multiple return values, one of which is conventionally an error code”.)
  • Extensibility: function, module, class
    • This is where it’s hardest for me to know where to stop. User-definable functions are a no-brainer; I don’t need default parameter values or variable-length argument lists, but if I have to choose I’ll take the former over the latter (because people can always pass “extra” arguments in lists or maps).
    • Modules that are namespaces are similarly a no-brainer.
    • But classes: once you open that up, where do you stop? You need some way for people to create records with named parts, extend the set of things they can iterate over, and so on, but if there’s a core set of features in my writing and teaching, I can’t see it.
  • The weird stuff: coroutines, introspection
    • It may seem weird to put coroutines in a list of “basic” features, but concurrency is too important to be left out of lessons on software design, and coroutines are the easiest way to introduce it.
    • (The easiest-easiest way is actually tuple spaces, but I put them in the same bucket as Lisp-like syntax: if they were going to catch on, they’d have caught on by now.)
    • Introspection might also seem out of place in this list, but so many software design techniques rely on being able to treat code as data that I’d struggle to teach without it.

But here’s the thing: this list is based on nothing more substantial than an hour-long cruise through lessons I’ve developed. What I really want is for some enterprising graduate student to spend a couple of months going through lessons and textbooks and counting how often various language constructs are used (cf. #10 and #48 in this list). I think it would be analogous to the way that chip designers work: look at what operations programs execute most often in order to figure out what to optimize. If you’re interested, please give me a shout: this is something I’d be happy to make time for.

The Least Assholish Person

@ellesystem wrote:

What actually makes a good first language? I’m sure CS ed has grappled with this, but it seems like a daunting question.

@elliewix replied

The language the least assholish person willing to teach in is the best. Anything that’s the best can still be made the worst by the instructor. (sorry, I’m glad people are looking at this but I’m just so exhausted and burned out from the entire enterprise of everthing)

She’s right, which means my one-day class on how to teach has a new and better goal: to make people the least assholish teachers they can possibly be.

Software Engineering Research Topics

I was honored to be given ACM SIGSOFT’s “Influential Educator” award in 2020, but I was also surprised: as far as I can tell, projects like Beautiful Code, Making Software, The Architecture of Open Source Applications, and It Will Never Work in Theory haven’t actually had any impact on how software engineering is taught.

However, I have been collecting random software engineering research ideas from friends and colleagues for more than a decade. I know it’s a weird hobby, but I’ve always believed that studying things practitioners are actually curious about would lead to more fruitful collaboration between academia and industry. Here, therefore, are the questions I’ve been asked since I started taking notes ten years ago. I apologize for not keeping track of who wanted to know, but if you’re working on any of these, please get in touch and I’ll try to track them down.

  1. Does putting documentation in code (e.g., Python’s docstrings) actually work better than keeping the documentation in separate files, and if so, by what measure(s)?

  2. Do doctest-style tests (i.e., tests embedded directly in the code being tested) have any impact long-term usability or maintainability compared to putting tests in separate files?

  3. Which tasks do developers collaborate on most often and which do they do solo most often? (If I’m reading my handwriting correctly, the questioner hypothesized that programmers routinely do bug triage in groups, but usually write new code alone, with other tasks falling in between.)

  4. Are slideshows written using HTML- or Markdown-based tools more text-intensive than those written in PowerPoint? In particular, are slides written in formats that version control understands (text) less likely to use diagrams than slides written with GUI tools?

  5. A lot of code metrics have been developed over the years; are there any for measuring/ranking the difficulty of getting software installed and configured?

  6. How does the percentage of effort devoted to tooling and deployment change as a project grows and/or ages? And how has it changed as we’ve moved from desktop applications to cloud-based applications? (Note: coming back to full-time coding after a decade away, my impression is that we’ve gone from packaging or building an installer taking 10% of effort to cloud deployment infrastructure being 25-30% of effort, but that’s just one data point.)

  7. Has anyone developed a graphical notation for software development processes like this one for game play?

  8. How do open source projects actually track and manage requirements or user needs? Do they use issues, is it done through discussion threads on email or chat, do people write wiki pages or PEPs, etc.?

  9. Has anyone ever done a quantitative survey of programming books aimed at professionals (i.e., not textbooks) to find out what people in industry care enough to write about or think others care about?

  10. Has anyone ever done a quantitative survey of the data structures used in undergraduate textbooks for courses that aren’t about data structures? I.e., do we know what data structures students are shown in their “other” courses?

  11. Has anyone ever compared a list of things empirical software engineering research has “proven” (ranked by confidence) versus a list of things programmers believe (similarly ranked)?

  12. Has anyone ever done a quantitative survey of how many claims in the top 100 software development books are backed by citations, and of those, how many are still considered valid?

  13. Are there any metrics for code fitness that take process and team into account? (I actually have the source for this one.)

  14. Which of the techniques catalogued in The Discussion Book are programmers familiar with? Which ones do they use informally (i.e., without explicit tool support), and how do they operationalize them?

  15. Is there a graphical notation like UML to show the problems you’re designing around or the special cases you’ve had to take into account rather than the finished solution to the problem (other than complete UML diagrams of the solutions you didn’t implement)?

  16. Ditto for architectural evolution over time: is there an explicit notation for “here’s how the system has changed”, and if so, can it show multiple changes in a single diagram or is it just stepwise?

  17. The Turing Test classifies a machine as “intelligent” if an independent observer can’t distinguish between it and a human being in conversation. Has anyone ever implemented a similar test for malicious software (which we should call the Hoye Test in honor of the person who proposed it, or the Moses Test in “honor” of the person who inspired it):
    1. Pick an application (e.g., Twitter).
    2. Build a work-alike that is deliberately malicious in some way (e.g., designed to radicalize its users).
    3. Have people selected at random use both and then guess which is which.
  18. Has anyone ever summarized the topics covered by ACM Doctoral Dissertation Award winners to see what computer science is actually about? (A subject is defined by what it gives awards for…)

  19. Has anyone ever surveyed developers to find out what the most boring part of their job is?

  20. Is there data anywhere on speakers’ fees at tech conferences broken down by by age, subject, gender, and geography?

  21. Are programmers with greenery or mini-gardens in the office happier and/or more productive than programmers with foosball tables? What about programmers working from home: does the presence of greenery and/or pets make a difference?

  22. How much do software engineering managers know about organizational behavior and/or social psychology? What mistruths and urban myths do they believe?

  23. Has anyone ever compared how long it takes to reach a workable level of understanding of a software system with and without UML diagrams or other graphical notations? More generally, is there any correlation between the amount or quality of different kinds of developer-oriented documentation and time-to-understanding, and if so, which kinds of documentation fare best?

  24. Is it possible to trace the genealogy of the slide decks used in undergrad software engineering classes (i.e., figure out who is adapting lessons originally written by whom)? If so, how does the material change over time?

  25. How do people physically organize coding lessons when using static site generators? For example, do they keep example programs in the same directory or subdirectory as the slides, or keep the slides in one place and the examples in another? And how do they handle incremental evolution of examples, where the first lesson builds a simple version of X, the next lesson changes some parts but leaves others alone, etc.?

  26. Has anyone ever applied security analysis techniques to emerging models of peer review to (for example) anticipate ways in which different kinds of open review might be gamed?

  27. Has anyone ever written a compare-and-contrast feature analysis of tools for building documentation and tutorials? For example, how do Sphinx, Jekyll, and roxygen stack up?

  28. Käfer et al’s paper comparing text and video tutorials for learning new software tools was interesting: has anyone done a follow-up?

  29. Bjarnason et al’s paper on retrospectives was interesting: has anyone looked in more detail at what developers discuss in retrospectives and (crucially) what impact that has?

  30. Has anyone studied adoption over time of changes (read: fixes) to Git’s interface? For example, how widely is git switch actually now being used? And how do adopters find out about it?

  31. Same questions for adoption of new CSS features.

  32. Is ther any correlation between the length of a project’s README file and how widely that software is used? If so, which drives which: does a more detailed README drive adoption or does adoption spur development of a more detailed README?

  33. Do any programming languages use one syntax for assigning an initial value to a variable and another syntax for updating that value, and if so, does distinguishing the two cases help? (Note: I think the person asking this question initially assumed that Python’s new := operator could only be used to assign an initial value.)

  34. How, when, and why do people move from one open source project to another? For example, do they tend to move from a project to one of its dependencies or one of the projects that depends on it? And do they tend to keep the same role in the new project or use the switch as an opportunity to change roles?

  35. How often do developers do performance profiling, what do they measure, and how do they measure it?

  36. Has anyone ever created some like Sajaniemi’s roles of variables for refactoring steps or test cases? (Note: the person asking the question is a self-taught programmer who found Gamma et al’s book a bit intimidating, and is looking for beginner-level patterns.)

  37. Has anyone defined a set of design patterns for the roles that columns play in dataframes during a data analysis?

  38. (How) does team size affect the proportion of time spent on planning and the accuracy of plans?

  39. Is there any way to detect altruism in software teams (i.e., how much time developer A spends helping developer B even though B’s problem isn’t officially A’s concern)? If so, is there any correlation between altruism and (for example) staff turnover or the long-term maintainability of the code base?

  40. Is there any correlation between the quality of the error messages in a software system and the quality of the community? (Note: by “quality of the community”, I believe the questioner meant things like “welcoming to newcomers” and “actually enforces its code of conduct”.)

  41. If you collect data from a dozen projects and guess which ones think they’re doing agile and which aren’t, is there anything more than a weak correlation to what process team members tell you they think they’re following? I.e., are different development methodologies distinct rhetorically but not practically?

  42. What are students taught about debugging after their introductory courses? How much of what they’re explicitly taught is domain-specific (e.g., “how to debug a graphics pipeline”)?

  43. Can we assess students’ proficiency with tools by watching screencasts of their work? And can we do it efficiently enough to make it a feasible way to grade how they code (as well as the code they write)?

  44. A lot of people have built computational notebooks based on text formats (like Markdown) or that run in the browser. Has anyone built a computational notebook starting with Microsoft Word or OpenOffice, i.e., embedded runnable code chunks and their output in a rich document?

  45. When people write essay-length explanations about error handling or database internals, how do they decide what’s worth explaining? Is it “I struggled to figure this out and want to save you the pain” or “I’m trying to build my reputation as an expert in this field” or something else?

  46. Has anyone done a study that plots when people get funded on a loose timeline of “building a startup” broken out by founders’ characteristics? I.e., if 0 is “I have an idea” and 100 is fully functioning company, where do most black/brown founders get funded vs. other poc founders vs. white founders?

  47. Has anyone analyzed videos of coding clubs for children or teens to see if girls are treated differently than boys by instructors and by their peers?

  48. How does the distribution of language constructs actually used in large programs vary by language? For example, if we plot percentage of programs that use feature X in a language, ordered by decreasing frequency, how do the curves for different languages compare?

  49. Is it possible to calculate something like a Gini coefficient to see how effectively scientists use computing? If so, is inequality static, decreasing, or increasing? (Note: the questioner felt strongly that the most proficient scientists are getting better at programming but the vast majority haven’t budged in the last three decades, so the gap between “median” and “best” is actually widening.)

  50. If you train a Markov text generator on your software’s documentation, generate some fake man pages, and give users a mix of real and fake pages, can they tell which are which?

  51. How does the number of (active) Slack channels in an organization grow as a function of time or of the number of employees?

  52. How well are software engineering researchers able to summarize each other’s work based solely on the abstracts of their research papers, and how does that compare to researchers in other domains?

  53. Second-line tech support staff often spend a lot of time explaining how things work in general so that they can solve a specific problem. How do they tell how much detail they need to go into?

  54. Is there a notation like CSS selectors for selecting parts of a program to display in tutorials? (Note: I’ve used several systems that relied on specially-formatted comments to slice sections out of programs for display; the questioner was using one of these for the first time and wondering if there was something simpler, more robust, or more general.)

  55. How does the order in which people write code differ from the order in which they explain code in a tutorial and why?

  56. Has anyone built a computational notebook that presents a two-column display with the code on the left and commentary on the right? If so, how does that change what people do or how they do it?

  57. Is it possible to extract entity-relationship diagrams from programs that use Pandas or the tidyverse to show how dataframes are being combined (e.g., to infer foreign key relationships)?

  58. What percentage of time to developers spend debugging and how does that vary by the kind of code they’re working on?

  59. At what point is it more economical to throw away a module and write a replacement instead of refactoring or extending the module to meet new needs?

  60. Are SQL statements written in execution order easier for novices to understand or less likely to be buggy than ones written in standard order? (Note: the questioner was learning SQL after learning to manipulate dataframes with the tidyverse, and found the out-of-order execution of SQL confusing after the in-order execution of tidyverse pipelines.)

  61. What error recovery techniques are used in what languages and applications how often?

  62. What labels do people define for GitHub issues and pull requests, and do they take those labels with them to new projects or re-think each project?

  63. Has anyone ever taught software engineering ethics by:
    1. Creating a set of scenarios, each with multiple-choice options.
    2. Having an ethics expert determine the best answer for each.
    3. Then have students and professionals answer the same questions.
    4. Analyzed the results to see how well each group matches the experts’ opinions and whether practitioners are any better than students.
  64. Has anyone ever studied students from the first year to the final year of their program to see what tools they actually start using when. In particular, when (if ever) do they start to use more advanced features of their IDE (e.g., “rename variable in scope”)?

  65. Underrepresented groups often develop “whisper networks” to share essential knowledge (e.g., a young woman joining a company might be taken aside for an off-the-record chat by an older colleague and cautioned about the behavior of certain senior male colleagues). How have these networks changed during the COVID-19 lockdown?

Thoughts on the Hippocratic License

For the last couple of years I’ve been putting the Hippocratic License on my personal projects, which basically says, “You’re free to use this software as long as you don’t violate human rights treaties.” Once in a while I get pushback: people say that it’s provocative, unenforceable, redundant (because most countries have signed those treaties), or that if you put restrictions on use then the software isn’t really “open”. (Sometimes they say these things quite forcefully.) So in reverse order:

  1. Licenses like the GPL put restrictions or conditions on use too—they just pick ones that didn’t make a certain breed of hacker in the 1980s and early 1990s uncomfortable arguing about. (I know—I was one of them.)
  2. As for redundant, sometimes I imagine how companies that depend on open software would react if a major project adopted the Hippocratic License and they had to affirm publicly that they weren’t violating the Universal Declaration of Human Rights. Those are pleasant daydreams…
  3. Unenforceable? Laws banning discrimination in hiring were also regarded as unenforceable (“What, you’re going to have someone from the government sit in on every job interview?”), but their existence forces potential violators to think twice.
  4. Finally, yes, choosing the Hippocratic License is provocative, but so was choosing any open license when I was younger. I’m old enough to remember when the things taken for granted by most of the people reading this post were considered unrealistic. Python and JavaScript as mainstream languages? Speaker lineups at tech conferences that aren’t 100% male? Pffft—never gonna happen, dude. (Yes, I’m also old enough to remember when people unironically called each other “dude”.)

Everyone thinks the world they first encounter is normal. Everyone forgets the people before them had to build that “normal”. So don’t tell me the Hippocratic License isn’t a real license just because GitHub doesn’t offer it as an option when you set up a new repo. And please don’t tell me you’re defending people’s rights if the only rights you’re defending are those related to commercial transactions, because I’d like the next generation to be able to think of a better world as “normal”.

Empirical Software Engineering Vignettes

I have been arguing for years that we should replace the standard undergraduate course on software engineering with one that:

  1. shows students how to gather and analyze data about programs, programmers, and programming, and then

  2. has them recapitulate key findings from empirical software engineering research.

The vignette below is an example of what I mean. It introduces several important ideas and would naturally lead into in-class analysis of how quickly the students themselves could complete some small tasks, how big and how correct their solutions were, and so on. A re-analysis of Fucci et al’s data on test-driven development or a repeat of Ragkhitwetsagul et al’s study of toxic code snippets on Stack Overflow would be just as much fun, and would helps students learn about code analysis tools and practical data science as well as teaching them what we actually know about software engineering and why we believe it’s true.

I think it would take one year to build this course, test it in the classroom, and get it into production. I’m pessimistic about update—my current job has reminded me just how slowly academia curricula change—but as my spouse keeps reminding me, you miss 100% of the shots you don’t take. If you know someone who would back this project, I’d be grateful for an introduction.

Are some programmers really ten times more productive than average? To find out, Prechelt2000 had a set of programmers solve the same problem in the language of their choice, then looked at how long it took them, how good their solutions were, and how fast those solutions ran. The data, which is available online, looks like this:


The columns hold the following information:

Column Meaning
person subject identifier
lang programming language used
z1000t running time for z1000 input file
z0t running time for z0 input file
z1000mem memory consumption at end of z1000 run
stmtL program length in statement lines of code
z1000rel output reliability for z1000 input file
m1000rel output reliability for m1000 input file
whours total subject work time
caps subject self-evaluation

The z1000rel and m1000rel columns tell us that all of these implementations are correct 98% of the time or better, which is considered acceptable. The rest of the data is much easier to understand as a box-and-whisker plot of the working time in hours (the whours column from the table). Each dot is a single data point jittered up or down a bit to be easier to see). The left and right boundaries of the box show the 25th and 75th percentiles respectively, i.e., 25% of the points lie below the box and 25% lie above it, and the mark in the middle shows the median:

Box-and-whisker plot show that most developers spent between zero and 20 hours but a few took as long as 63 hours.
Development Time

So what does this data tell us about productivity? As Prechelt2019 explains, that depends on exactly what we mean. The shortest and longest development times were 0.6 and 63 hours respectively, giving a ratio of 105X. However, the subjects used seven different languages; if we only look at those who used Java (about 30% of the whole) the shortest and longest times are 3.8 and 63 hours, giving a ratio of “only” 17X.

But comparing the best and the worst of anything is guaranteed to give us an exaggerated impression of the difference. If we compare the 75th percentile (which is the middle of the top half of the data) to the 25th percentile (which is the middle of the bottom half) we get a ratio of 18.5/7.25 or 2.55; if we compare the 90th percentile to the 50th we get 3.7, and other comparisons give us other values. The answers to our original question are therefore:

  1. It depends what you mean.

  2. No, good programmers aren’t 10 times more productive than average.

  3. But yes, it’s reasonable to say that they are about four times more productive.

Lutz Prechelt: "An empirical comparison of seven programming languages". IEEE Computer, 2000, doi:10.1109/2.876288.
Lutz Prechelt: "The mythical 10x programmer". In Sadowski and Zimmermann (eds.) Rethinking Productivity in Software Engineering, 2019.

Slides for Teaching Tech Together

I have posted slides for a one-day workshop based on Teaching Tech Together. Some material was originally created for The Carpentries’ instructor training and RStudio’s instructor certification program; all can be re-used and recycled under the Creative Commons - Attribution license. Corrections and improvements are always welcome, but please keep in mind that this material already makes for a long day: if you wish to add anything, please try to identify something that can be taken out to make room.

Hard Problems

I don’t know how many hard problems there are in computer science, but now that I’m coding for a living again after ten years doing other things, I think the four that matter most to working programmers are (in order):

  1. Figuring out what users actually want.
  2. Managing configuration & deployment. (Used to be an hour a week; it’s now a third of a dev’s time.)
  3. Getting the next person up to speed with the code base.
  4. Handling partial/intermittent failure.

I know there are lots of other interesting challenges worth studying in software engineering research, but solutions to these—practical solutions that people would actually adopt—would have more impact on more programmers than anything else.

The Sisyphus Test

I believe that if you’re serious about diversity, equity, and inclusivity you have to teach people how to hold the powerful to account: how to organize, publicize, and litigate. If you don’t, you’re ducking the hardest part and leaving people hanging when they need help most. Similarly, I evaluate books and training on how to be a technical manager the same way: if you don’t explicitly cover what to do when your boss or your boss’s boss behaves badly, or pretend it can always be resolved by “radical candor” or some similar bullshit, you have failed your audience.

This has led me to what I call the Sisyphus Test. If you see someone rolling a rock up a hill over and over, do you:

  1. Blame them.
  2. Say, “Someone should do something.”
  3. Go help in order to lighten their burden.
  4. Dig up the hill so no one ever has to do this again.

#1 seems to be most conservatives’ default. #2 is the mushy middle, while #3 is favored by well-meaning people who want things to be better but don’t really want anything important to change. If you pick #4 you’re labelled a radical, particularly if you say that we should also make sure that whoever set the task never has the power to do something like that to anyone ever again. To me, though, it seems like the most sensible option.

Side by Side

We finally said goodbye to my mum yesterday. She’s buried beside my dad and my sister in a grassy little field near where I still, forty years on, think of as “home”.

The trip was supposed to be our first real family holiday in two years, but Sadie and I came down with COVID on our second day, so we spent almost all of it holed up in hotel rooms coughing and sleeping. It’s not how I wanted to say goodbye, but I’m glad they’re all together.

Comes a day, last kiss
Comes a day, last breath
Comes a day—and then another

Poynting Collector

Back in 2020 I asked:

If you’re running a 400KW oscillation overthruster in 5-10 msec bursts, how much shielding do you need to reduce harmful effects from the intermediate vector bosons it throws off to acceptable levels? (Beam alignment is phase neutral but non-moiré.)

One person responded:

The rule of thumb is 218 quanta per KW/ms if you are using lead, but check your supplier for more modern alternatives. We haven’t recommended lead on oscillation thrusters for a decade due to exactly the problem you’re trying to solve.

However, someone else said:

What if you didn’t treat vector bosons as a harmful byproduct to be compensated for but a useful part of the closed loop system? Get a Poynting collector and feed it back into your spin condenser.

It was an interesting idea that sent me down a couple of rabbit holes. (For example, it turns out there’s a thriving market in Soviet-era industrial supplies in Ecuador…) By March 2021 I was far enough along to say:

Beam alignment still isn’t phase-neutral and I still don’t know how I’m going to couple it to the Poynting collector but we’re getting there…

oscillation overthruster

But nothing’s ever done. The first reply asked if I’d rotated the starboard beam collimator properly. After a bit of experimentation I realized that I wouldn’t be able to get the required precision because my diffraction grid is analog rather than digital (see the comment above about Soviet-era industrial supplies).

I set the project aside for a while (COVID, job changes, etc.), but finally found time to come back to it in May and June. Having taken everything apart and put it back together, I now believe the easiest way forward is going to be to take the photozygometer out of this:

laser (not to scale)

and use it to spallate a new diffraction grid. I’ve never done this before, though, so if you have with either a lithium chloride or lithium iodide photozygometer, I’d be grateful if you could give me a shout.