Not long ago I interviewed for a community manager position
with a company that’s pretty well known in open source.
They opened by asking me to write paragraph-length answers to some fairly innocuous questions,
but in the second round,
they asked me to do an IQ test and a personality profile test.
Both of these were managed by a third party and done online;
after dithering for a few days
I decided to put my reservations aside and get on with it.
That’s when I discovered that the first page of each test
required me to enter my name,
my email address—and choose my gender,
with “Male” and “Female” being the only options.
At this point my reservations broke out of the basement I’d locked them in
and began clamoring for attention.
As far as I could tell,
the IQ test consisted of number-matching problems;
I’m not going to believe that scores in that need to be adjusted by gender
without seeing (and checking) the data.
any company in the HR business today should know better than to restrict gender choices
to a binary “M or F”.
So I mailed the recruiter,
who connected me with the head of HR,
who told me that
they were using tests to eliminate unconscious bias in their hiring process.
When I asked if they had any evidence showing a correlation between
test scores and on-the-job performance,
the HR manager said no,
and that such evidence would be very hard to come by.
It took me less than a minute of searching online
to find companies that would give me
an unlimited number of practice sessions
for both the IQ and personality profile tests for a mere £75.
One of these sites hinted very strongly that for a negotiable fee
they’d be happy to provide “exam assistance”,
which meant “we’ll write the exam for you” back when I was a prof.
My conclusion was that the company I’d applied to
was more concerned with appearances than with eliminating bias,
so I withdrew my application.
I’m very privileged in being able to turn down jobs I don’t want.
You might not be able to do that—you
might have to set aside your queasiness and do the tests
even though your gut tells you their flaws are dangerous—but
I want you to know that your instincts are right.
And if you are able to say “no” when you’re in a situation like this,
please do so,
because every time someone senior says “yes”,
it’s that much harder for someone junior to refuse.
As I announced a few days ago,
I am running an online workshop on “Managing Research Software Projects” on September 29-30
to raise money for MetaDocencia.
Tickets are available on Eventbrite
and the curriculum and schedule are taking shape:
I’d be very grateful for feedback on what’s missing
and what you think doesn’t need to be covered.
Who you are, who's teaching this workshop, what we're going to cover, and how to participate.
How to ensure meetings are short, productive, and fair.
Figuring out who gets to decide what and how to tell when a decision has been made.
How to organize project collateral, make work findable, reproducible, and shareable.
How to attract new contributors and help them feel welcome and be productive.
Guidelines for creating software that is maintainable and reliable.
Managing who does what and when.
Things you need to do now that you're in charge.
What we actually know and why we believe it's true.
Because sometimes the only way to fix a problem is to fix the institution.
We may also delve into a few bonus topics:
Making research software robust enough to be run by anyone, anywhere.
Building good working relationships between academia and industry.
Making work more findable.
The basics of personal digital safety (because having a higher profile may attract the wrong kind of attention).
I am running an online workshop on “Managing Research Software Projects” from 10:00-14:00 Toronto time on Sep 29-30 to raise money for MetaDocencia, an inclusive and collaborative community that improves education by empowering instructors from underserved countries. The workshop introduces the ideas and tools you need to manage a team of up to a dozen people working together to build research software. The workshop is intended for new faculty who are setting up their labs, the creators of open source projects that now have other contributors, and everyone else who finds themselves wrangling people and deadlines as well as code. Topics will include:
how to run an effective meeting,
how to recruit new contributors and help them be productive,
how to prioritize work,
how to design complex software and ensure its reliability, and
If your company would like to sponsor this workshop, the best way is to purchase tickets for people who couldn’t otherwise take part—please contact me for details, and thank you in advance.
Jo, 31, completed a PhD in geology several years ago and now works for a national laboratory.
The fracture modeling software they wrote in grad school is now being used by two dozen research groups around the world,
several of which have started contributing fixes and extensions of varying quality.
Jo has just been given a post-doc and a junior programmer to expand the code as well,
and wants to learn how to decide which pull requests are safe to merge,
decide what’s most important to work on next,
and handle people who spend more time arguing on Slack than they do writing code.
This workshop will show them what a healthy mid-sized project looks like
and how to manage both staff and external contributors.
After reviewing three books more-or-less titled “Data Science for Social Scientists”,
I think what our field desperately needs is “Social Science for Data Scientists”.
I don’t know enough to write it, but I’ll pre-order a bunch of copies…
Someone replied, “…what you’re asking for is a textbook - those exist.”
With respect, I disagree.
Most people—including most working programmers—won’t read a textbook unless they absolutely have to.
Saying that we can fix this problem by writing (or pointing at) a textbook
is like saying that abstinence programs are the solution to teen pregnancy:
it allows you to claim you’ve solved the problem
without threatening anything else you want to believe in.
This response highlights a perspective I’ve struggled against for many years.
Brent Gorda and I chose the name Software Carpentry
because it wasn’t software “engineering”.
We wanted to teach people the equivalent of hanging drywall and fixing leaky taps,
not the equivalent of digging the Channel Tunnel.
What we learned is that (a) carpentry is more useful to most people than engineering,
but (b) skilled trades have lower social status
than “gentlemanly” pursuits involving multi-colored algebra on whiteboards.
Academia reacts the same way to popularization:
John Galbraith and Carl Sagan were both looked down on by many of their peers
for deigning to explain science to non-specialists.
Those who do can have tremendous impact, and not always for good.
Freakonomics persuaded literally millions of readers
that the only valid way to analyze social interactions was through the lens of personal interest.
By doing so,
it built a constituency for changes in legislation and taxation
that have fueled increasing inequality in society.
So if “programmers and data scientists don’t understand how society works” is the problem,
I don’t think textbooks are the answer.
They can help people who are already convinced they want to know more to keep learning,
but “already convinced” is the hard part.
we need someone who knows enough to know what corners to cut
and what simplifications will not mislead,
and who doesn’t think that being comprehensible is somehow shameful.
We need someone who can explain to programmers steeped in Silicon Valley’s small-L libertarian zeitgeist
why racial discrimination persists even though it’s economically inefficient,
how regulatory capture works,
why CEOs keep getting away with sexual assault,
and why Radical Candor is bullshit in the service of power.
If this is you,
please give me a shout.
I’m repeating my webinar on Software Design for Data Scientists on Sept 1:
tickets are available through Eventbrite
(at two prices depending on your circumstances),
attendees will get a £100 e-book voucher from the good folks at CRC Press,
and all proceeds will go to Books for Africa.
I hope to see you there.
My webinar on “Software Design for Data Scientists” raised over $600 for Books for Africa. The key points are summarized below; if you’d like me to give the talk to your company, university, or other organization, I’d be happy to do so in exchange for a donation to BfA or some other mutually-agreed charity.
Rule 0: Computer scientists aren’t taught software design either, so don’t feel like you’ve missed something.
Rule 1: Design for people’s cognitive capacity (7±2, chunking, and all that).
Rule 2: Design toward widely-used abstractions and maximize the ratio of “what’s unique in this statement” to boilerplate, but remember that the tradeoff between abstraction and comprehension depends on how much people already know.
Rule 3: Design for evolution, because the problem, the tools, and your understanding will all change each other. The key tool for this is information hiding; the Liskov Substitution Principle and Design by Contract will help.
Rule 4: Design for testability - not just because you want to test, but because it’s a way to validate designs.
Rule 5: Design as if code was data, because it is. Programs are just text files; code in memory is just another data structure, and taking advantage of this can make designs much more elegant (but also less comprehensible to newcomers).
Rule 6: Design for delivery - organize your code the way your packaging system expects, handle errors instead of printing them and discarding them, and use a proper logging system.
Rule 7: Design graphically (but don’t try to create software blueprints). Flowcharts, informal architecture diagrams, entity-relationship diagrams, and use case maps will all help people understand the overall design.
Rule 8: Design after the fact. The most important thing isn’t to follow any particular process, it’s to look as though you did so that the next person can retrace your steps.
Rule 9: Design with villains in mind, because security, privacy, and fairness can’t be sprinkled on after the fact.
Rule 10: Design collaboratively and inclusively, because it will produce a better design (and because it’s just the right thing to do).
TidyBlocks is a Scratch-like tool for doing basic data science.
Originally built by Maya Gans,
it was overhauled in 2020,
after which several volunteers translated its interface into several different (human) languages.
We were excited by its potential, but:
We had reached the limits of what the Blockly toolkit could do
without some serious extension work.
there’s no comprehensible way to represent joins using the available styles of blocks.)
Nobody was willing to fund further development.
The overhaul in 2020 took about 300-400 hours of volunteer time;
while I would have liked to continue,
I didn’t see a way forward without fixing #1 above,
and that couldn’t be done without financial support.
I still think the idea is a good one:
the user testing we did showed that the interface is immediately comprehensible
to anyone who has used Scratch
(which these days means most middle school kids and their teachers),
and after watching my daughter plod through their school’s “data literacy” module,
I think we need something better.
I hope someone, some day, will find a way to make it happen.
My mum would have been 94 today, and my sister would have been 57.
Mum knit every day for more than 80 years and Sylvia collected toy mice, so I got this.
I suspect both would have disapproved but secretly been pleased.