A Hard-Boiled Egg

Data Science, Software Engineering, and a Plausible Path to a Slightly Better Future


Greg Wilson


October 2018

CC-BY logo

"You'd like Freedom, Truth, and Justice, wouldn't you, Comrade Sergeant?" said Reg encouragingly.

"I'd like a hard-boiled egg," said Vimes, shaking the match out.

There was some nervous laughter, but Reg looked offended. "In the circumstances, Sergeant, I think we should set our sights a little higher—"

"Well, yes, we could," said Vimes. "But…well, Reg, tomorrow the sun will come up again, and I'm pretty sure that whatever happens we won't have found Freedom, and there won't be a whole lot of Justice, and I'm damn sure we won't have found Truth. But it's just possible that I might get a hard-boiled egg."

— Sir Terry Pratchett, Night Watch

What I'm Building On

Veronika Cheplygina

Weak failure: I did a lot of work and nothing came of it

Strong failure: I did a lot of work and actually did harm

This talk is built on several weak failures

Back in the Twentieth Century

Dr Dobb's Journal

Book review editor for Doctor Dobb's Journal

Hundreds of textbooks on compilers, but none on debuggers or debugging

Or build tools, or package managers, or…

Early 2000s

Asked to teach a course on software architecture

Looked at two dozen books and other people's courses…

…but none described actual architectures

This, we can fix

Beautiful Code AOSA Vol 1 AOSA Vol 2 POSA 500 Lines

Countervailing Currents

Learning should always flow in both directions

What do software engineering researchers actually know that practitioners might care about?

Making Software

Some Beautiful Truths

Languages in the C family are as hard to learn as a randomly-designed language

Students don't make the mistakes instructors think they make

No significant difference between test-driven and test-after development

But there is a difference between coarse-grained and fine-grained coding

Some Beautiful Truths

Most catastrophic failures in distributed data-intensive systems could be prevented by performing simple testing on error handling code

Git is awful because of the mis-match between users' mental models and its actual operation

Giving it a human-friendly face is ineffective

Computer science grades are not bimodal

I.e., there is no geek gene

Weak Failures

To first order, these books have had no impact

We still don't teach science to CS students

Biologists spend 6 hours/week in the lab

CS students do one experiment in four years

After teaching programming to scientists for 20 years, I've come to believe that science is a good thing

And that statistics is going to be the math that puts the "engineering" in "software engineering"

But the curriculum is full

And we ought to help everyone, not just CS majors

Measuring Things with Body Parts

Thumbs, feet, and arms


inches, feet, and yards

Measure this way in the US because everyone else does

So what tools can we reasonably expect most potential learners to already know?

Or to pick up in one bladder of hands-on instruction?

Today's Choices

A functional reactive language
with an intuitive interface

read_csv('infant-hiv.csv') %>%
  pmap_lgl(function(...) {
    x <- list(...)
    all(is.na(x) | (x == ""))
  }) %>%
  tibble(empty = .) %>%
  mutate(id = row_number()) %>%
  filter(empty) %>%

A beautiful view
but a very steep hill

A Tool That Just Works

A Tool That Just Works

Multiple studies have found that people learn programming and computational thinking faster using blocks than using text

Eliminates pointless syntax mistakes

Obvious affordances suggest possibilities

Plays well on touch-screen devices

A Tool That Just Works …For Some Tasks

Manipulating data is clumsy but doesn't have to be

You Can See Where This Is Going…

Build Scratch blocks for manipulating tabular data


Blockly provides the framework

The tidyverse defines the requirements

Millions of people will know how to use it

And deployment is a solved problem

But Wait, I Have Data


AP Student Numbers
Computer Science26,10044,000


Statistics is an authentic task

For teachers as well as for students

And Then We Can Teach Engineering

"The use of the scientific method to analyze and design structures, devices, and systems."

Require statistics rather than calculus for entry into CS

Have students analyze software engineering data starting in their first year

For Example

"Given source code for six software projects, determine whether long functions have more bugs than short ones."

Requires tool use, model building, and statistical analysis

Students do science, so they understand and value it, so they engage with it later

Fits into existing curriculum (implements IFirstYearMath)

And is culturally defensible

Then We Can Teach What Really Matters

I no longer believe my generation will fix this

Women in CS

So we need to raise up a generation that will

This Is Also Software Engineering

People of East Asian or South Asian ancestry make up 8% of the Canadian population, but 60-75% of undergraduates in Computer Science at major universities. Write two one-page papers to argue pro and con that this proves people of European descent are naturally less capable of abstract reasoning than their Asian counterparts.

And then compare and contrast your arguments with those made about female under-representation in computing.

Other Gaps on My Shelves

Software Tools in JavaScript: The New Standard Model

Merely Useful: An Introduction to Research Computing

The Undergraduate Operator's Manual

Sex and Drugs and Guns and Code: What Everyone in Tech Needs to Know About Politics, Economics, and Power


Thank you