May 30, 2019: Software Engineering Revisited

I am at the International Conference on Software Engineering for the first time in a decade. It’s been good to catch up with friends, but this fly-by has confirmed several things for me:

  1. I did the right thing leaving academia—I don’t have the patience or diligence to be a good researcher.

  2. Most software engineering research has the same effect on programmers that astronomy has on stars. Some of this is because most of the problems that most working programmers face are hard for researchers to tackle, not considered “interesting”, or both. However, some of it is also the gulf between researchers and practitioners, which hasn’t narrowed noticeably since Vancouver in 2009. Our effort to foster dialog failed, but…

  3. …I am more convinced than ever that the standard third-year introduction to software engineering based on textbooks like these should be replaced by one that teaches undergraduates how to get, clean, analyze, and understand software engineering data.

I’ve made the case for #3 before, and if anything, the case is stronger now than it was then. If we call it “Data Science for Software Engineering” the dean will think it’s a good idea; if we tell faculty in other disciplines that we’re (finally) requiring some math in the core software engineering course, they will nod approvingly, and if we teach students how to use the scientific method to separate fact from fiction, a lot of the fads, half-truths, and outright bullshit floating around the murky pond of industry might finally settle to the bottom.

A course like this would have to include some statistics—say, up to the level of the AP Stats exam in the US. It would also include a bit of programming to implement analyses, but I think that students in the second half of a Computer Science degree can pick up enough tidyverse or Pandas in a three-hour lab session to do what they’d need to do. (They could even write their analyses in JavaScript if they wanted to.) The heart of the course, though, would be how to translate imprecise questions into runnable statistical models and how the choices made while doing so affect the answers produced. By the end of the course, I would want students to be able to answer questions like these:

But I want to prepare students to go further. Peggy Storey’s outstanding keynote at ICSE this morning described three schools of thought in software engineering research:

  1. Formal sciences (philosophical and mathematical foundations) use the scientific method, focus on evidence-based reality, and value the quantitative over the qualitative.
  2. Explanatory sciences (descriptive and predictive theories) acknowledge that reality is subjective and experiential, expect biases and make them explicit, focus on theory generation, and value the qualitative over the quantitative.
  3. Design sciences (exemplified by medical treatments and engineering solutions) are collaborative and change-oriented, and use a mix of methods. They are shaped by researchers’ political and social lenses; to paraphrase Marx, their goal isn’t to understand the world, but to change it.

The course I’ve outlined above is firmly in category #1, but a lot of the work I value most (like Marian Petre’s) falls into the second category, and I’m personally in the third. I don’t know if it’s possible to give equal weight to all three in a single course; I suspect that if we try to do so, the kind of people who regarded Software Carpentry as “merely useful” will think that we are watering down the course.

It’s worth trying, though. Putting this course together would be a big project, but no bigger than creating a practical introduction to statistics for psychologists or economists. It would be fun, and if anyone needs a little self-interest to spur them on, I strongly suspect that people who’d gone through this as undergrads would pay more attention to software engineering research and be more willing to collaborate with software engineering researchers once they were in industry.

< OlderNewer >