Software Engineering's
Greatest Hits

far too many slides in far too little time

Greg Wilson

February 2019


The Road So Far

Dr Dobb's Journal
  • Book review editor for Doctor Dobb's Journal
  • Hundreds of textbooks on compilers, but no textbooks on debuggers or debugging
  • Or build tools, or package managers, or…

The Road So Far

  • Asked to teach a course on software architecture
  • Looked at two dozen books and other people's courses…
  • …but no textbooks describe actual architectures




And that got me wondering…

Is Software Engineering?

Street Preacher


Chickens and Eggs

  • We know a lot about software and how it's built
  • But students aren't taught empirical methods

  • Biologists spend 6 hours/week in the lab
  • CS students do one experiment in four years
  • So it's not surprising they (we) don't understand or value the scientific method

The Seven Years War

Sea Battle
  • The British lost 1512 sailors to enemy action...
  • ...and 100,000 to scurvy

It Didn't Have to Happen

James Lind
  • James Lind (1716-94)
  • 1747: the first controlled experiment in medical history
sea watercider
sulfuric acidvinegar
barley wateroranges

It Took a While

  • 1950: Hill & Doll publish a case-control study comparing smokers with non-smokers
  • Smoking causes lung cancer
  • Most people would rather fail than change

We Can Do Better

  • Steady growth over 20 years of empirical studies
  • Fueled by availability of data
  • And by realization that practitioners find most "classical" software engineering research irrelevant
  • Many studies are small, and not all are well done, but the trend is clear
ICSE 2019

Are Some Languages Better Than Others?

Stefik et al 2013: An Empirical Investigation into Programming Language Syntax

  • First studied compared learnability of
    • Perl
    • Quorum (the language their team is building)
    • Randomo (a placebo whose syntax was "designed" by rolling D&D dice)
  • Conclusion: Perl is as hard for novices to learn as a language with a randomly-designed syntax

We first present two surveys conducted with students on the intuitiveness of syntax, which we used to garner formative clues on what words and symbols might be easy for novices to understand. We followed up with two studies on the accuracy rates of novices using a total of six programming languages: Ruby, Java, Perl, Python, Randomo, and Quorum. To our surprise, we found that languages using a more traditional C-style syntax (both Perl and Java) did not afford accuracy rates significantly higher than a language with randomly generated keywords, but that languages which deviate (Quorum, Python, and Ruby) did.

Are Some Languages Better Than Others?

  • Second study
    • More subjects and multiple assessment strategies
    • Languages in the C family are as hard to learn as a randomly-designed language
    • Ruby and Python are significantly easier
    • Quorum is easier still
  • Reaction has shown just how little most developers know or care about the scientific method
  • Discussed in this podcast

You Can't Ask Them

Altadmri & Brown 2016: 37 Million Compilations: Investigating Novice Programming Mistakes in Large-Scale Student Data

  • Ask educators for learners' most common mistakes
  • Compare their answers to data from the BlueJ Blackbox project
  • Weak consensus among educators
  • Weak correlation with observations
  • Educator experience had only weak effect on results

We used the Blackbox data set to check whether the educators' opinions matched data from over 100,000 students and checked whether this agreement was mediated by educators' experience. We found that educators formed only a weak consensus about which mistakes are most frequent, that their rankings bore only a moderate correspondence to the students' data.

You Can't Ask Them

  • Most common actual errors are:
    • Mis-matched parentheses (not confusing = with ==)
    • Invoking methods with the wrong arguments is #2
    • Control flow reaching end of non-void method without return is #3
  • The three that take the most time to fix are:
    1. Confusing short-circuit logical operators bitwise equivalents
    2. Using == instead of .equals to compare strings
    3. Ignoring the return value from a non-void method

Let's Talk About Test-Driven Development...

Erdogmus et al: "How Effective is Test-Driven Development?" (in Making Software, 2010)

[e]vidence from controlled experiments suggests an improvement in productivity when TDD is used. However...pilot studies provide mixed evidence, some in favor of and others against TDD. In the industrial studies...evidence suggests that TDD yields worse productivity. Even when considering only the more rigorous studies...the evidence is equally split for and against a positive effect.

Let's Talk About Test-Driven Development...

Fucci et al 2016: An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach

  • 39 professionals working on real projects
  • Replication of study done by other researchers
  • No significant difference between test-first and test-last development

Method: We analyzed 82 data points collected from 39 professionals, each capturing the process used while performing a specific development task. We built regression models to assess the impact of process characteristics on quality and productivity. Quality was measured by functional correctness. Result: Quality and productivity improvements were primarily positively associated with the granularity and uniformity. Sequencing, the order in which test and production code are written, had no important influence. Refactoring effort was negatively associated with both outcomes. We explain the unexpected negative correlation with quality by possible prevalence of mixed refactoring. Conclusion: The claimed benefits of TDD may not be due to its distinctive test-first dynamic, but rather due to the fact that TDD-like processes encourage fine-grained, steady steps that improve focus and flow.

Let's Talk About Test-Driven Development...

  • "The claimed benefits of TDD may not be due to its test-drive dynamic, but rather due to the fact that [it] encourages fine-grained steady steps that improve focus and flow."
  • Discussion has been heated
  • "I practice TDD...and it works great. We don't need to prove that it works anymore... [T]here are some great stories on [my] site."

Code Review

Why don't we teach this??

A Surprising Result

Bird et al 2009: Does Distributed Development Affect Software Quality? An Empirical Case Study of Windows Vista

Geographic distribution has little effect on bug rates

Distribution of team members in the org chart is a much better predictor

More About Diagrams

Cherubini & Venolia 2007: Let's Go to the Whiteboard

  • Look at what developers draw when they're talking to teach other...
  • ...and how well they can understand their own drawings hours or days later
  • Diagrams are a cache for short-term memory, not archival...
  • ...which may explain why UML hasn't caught on

Most of the diagrams had a transient nature because of the high cost of changing whiteboard sketches to electronic renderings. Diagrams that documented design decisions were often externalized in these temporary drawings and then subsequently lost. Current visualization tools and the software development practices that we observed do not solve these issues,

What Happens When Teams Go Agile?

Khomh et al 2012: Do Faster Releases Improve Software Quality?

  • Looked at Firefox before and after the transition to rapid release and found:
    1. Users do not experience more post-release bugs
    2. Bugs are fixed faster
    3. When crashes do happen, they happen sooner after startup
  • Still don't have an explanation for that last one...
    • ...which is how science progresses

We found that (1) with shorter release cycles, users do not experience significantly more post-release bugs and (2) bugs are fixed faster, yet (3) users experience these bugs earlier during software execution (the program crashes earlier).

Actionable Findings

Yuan et al 2014: Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems

  • 198 randomly selected, user-reported failures on Cassandra, Hadoop MapReduce, etc.
  • Almost all failures require <=3 nodes to reproduce
  • Error logs typically contain sufficient data to reproduce
  • Majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code

Actionable Findings

Nakshatri et al 2016: Analysis of Exception Handling Patterns in Java Projects: An Empirical Study

  • Most common catch block logs the error rather than trying to recover from it
  • Next most common do nothing (20% of cases) or convert the checked exception into an unchecked exception so that it can be ignored.
  • Most programmers ignore the exception hierarchy and simply catch Exception (78%) or Throwable (84%)

Paradise Unplugged

Ford et al 2016: Paradise Unplugged: Identifying Barriers for Female Participation on Stack Overflow

  • Only 5-6% of Stack Overflow contributors are women
  • What do they find significantly more problematic than men?
    1. Lack of awareness of site features
    2. Feeling unqualified to answer questions
    3. Intimidating community size
    4. Discomfort interacting with or relying on strangers
    5. Perception that they shouldn't be "slacking" online communities, such as Stack Overflow...only 5.8% of contributors are female.... Through 22 semi-structured interviews with a spectrum of female users ranging from non-contributors to a top 100 ranked user of all time, we identified 14 barriers preventing them from contributing to Stack Overflow. We then conducted a survey with 1470 female and male developers to confirm which barriers are gender related or general problems for everyone.

Open Source in General

Steinmacher et al: Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects

  • Identify 58 potential barriers (including 13 social barriers)
  • What matters most?
    1. How easy is it to get set up to make a contribution?
    2. How easy is it to find a task to start with?
  • Other work has also identified "how warmly was my first contribution received?"

...our study qualitatively analyzed social barriers that hindered newcomers' first contributions. We defined a conceptual model composed of 58 barriers including 13 social barriers. The barriers were identified from a qualitative data analysis considering different sources: a systematic literature review; open question responses gathered from OSS projects' contributors; students contributing to OSS projects; and semi-structured interviews with 36 developers from 14 different projects.

There Is No "Geek Gene"

Patitsas et al 2016: Computer Science Grades Are Not Bimodal

  • The "geek gene" is computing's most enduring and damaging myth
  • But only 5.8% of course grade distributions at a large university were actually multi-modal
  • And CS faculty are more likely to see distributions as bimodal if they think they're from a CS class
    • Even more likely if they believe some students are innately predisposed to do well in CS

We statistically analyzed 778 distributions of final course grades from a large... university, and found only 5.8%...passed tests of multimodality. We then... showed 53 CS professors a series of histograms displaying ambiguous distributions and asked them to categorize the distributions. A random half of participants were primed to think about the fact that CS grades are commonly thought to be bimodal; these participants were more likely to label ambiguous distributions as "bimodal". Participants were also more likely to label distributions as bimodal if they believed that some students are innately predisposed to do better at CS.

When I Rule the World

  • Software engineering courses will include assignments like this:
    Given version control repositories for six software projects, determine whether long functions and methods are more likely to be buggy than short ones.
  • Requires tool use, model building, and statistics
  • Encourages students to do science, so they understand it, so they value it
  • Fits into existing curriculum
  • Culturally defensible

When I Rule the World

And this:

People of East Asian or South Asian ancestry make up 8% of the general population, but 50-60% of undergraduates in Computer Science at major universities. Write two 1000-word position papers to argue pro and con the proposition that this proves people of European ancestry are less capable of logical thinking than people of Asian ancestry.

We may not be able to teach empathy, but we can teach skepticism.

See Also

Evidence-Based Software Engineering Using R

Derek Jones


This is the world we need

Getting started this summer