Many years ago
I was briefly part of a university group trying to get
better working conditions for grad students, post-docs, adjuncts,
and other members of academia’s petite bourgeoisie.
(And yes,
we were the sort of people who used terms like “petite bourgeoisie”
to show each other how clever we were.)
One Tuesday evening
an older gentleman showed up to one of our meetings.
He listened patiently as we talked,
then cleared his throat and said,
“You know, there’s probably an easier way to do this.
If you can get a meeting with the dean,
he might—”
“We’ve already tried talking to dean,” someone said dismissively.
The guy nodded.
“I understand that,
but I think that if you say you want to talk about retention rather than—”
“The problem isn’t just retaining people,”
someone else immediately said,
“We need to broaden the intake.”
I think he tried to speak one more time,
only to be cut off in the same way.
As we all went in circles saying,
“Well actually, the real problem is…”
he quietly got up and went to the blackboard.
(We were meeting in an empty classroom,
and yes,
it was long enough ago that they still had blackboards.)
None of us noticed that he was writing,
but when the door closed behind him a few moments later,
we all saw the message he’d left behind:
You have just cut me off mid-sentence three times in less than a minute.
Based on that,
I don’t think a future built by you
will be better than what we have today.
We dismissed it, of course—I mean, hell, he’d been wearing a tie.
But I found out later that he was the first openly gay man
to hold an administrative position at that university,
and that he’d worked for over twenty years
to make admissions and promotions fairer.
Unfortunately,
I learned that from his obituary notice.
I still wonder what I could have learned from him
if I’d been less concerned about impressing people with how smart I was
and more willing to listen.
Thirty years later,
a billionaire named Marc Andreessen published a manifesto
with the intellectual depth and writing style
of something a freshman would throw together in a caffeine-fueled frenzy
after binging on right-wing podcast [Andreessen2023].
Andreessen’s manifesto attacked anyone who thought technology should be regulated,
that its risks should be weighed against its benefits,
or that workers and communities affected by technological change should have a say in it.
Among the heroes he cited was F.T. Marinetti,
the Italian futurist who wrote in 1909 that war is “the world’s only hygiene”
and that civilization should be cleansed of feminism, democracy, and weakness.
Marinetti’s work inspired Benito Mussolini,
the founder of Fascism;
Andreessen is one of the most powerful venture capitalists in the world,
and is one of a growing number of big tech billionaires
who believe they can dispense with the people and the society
that made their success possible.
Andreessen’s manifesto is part of why I’m writing these essays.
His views are repugnant,
and I’m offended by how shallow and superficial his thinking is,
but the real reason is that most people in tech don’t know enough about how the world actually works
to have an immune response to his self-serving bullshit.
I studied engineering as an undergraduate and then became a programmer;
during and after my degrees,
I made more than my share of disparaging jokes about fluffy disciplines
like politics, sociology, and philosophy.
It took me a long time to admit that these are just as rigorous as math and physics,
and that most of my strongly-held beliefs were just
the opening moves in a chess game that others had been playing for centuries.
These essays are, in a way,
an extended apology to some of the people I sneered at
(but only some, because many were just as pretentious as I was).
Another reason I’m writing this is that I am sixty-three years old.
People my age run countries and make life-and-death decisions that affect millions of people.
As unbelievable as it seems to me,
somehow we are the grownups.
In another few years,
though,
we’ll be retired and you will be in charge.
I’d like you to be readier than I was,
and I don’t think more lessons about the Unix shell or version control are going to help.
But I don’t want you to understand the world
just so that I can sleep at night.
I want you to understand is so that you can make it better,
because that’s the greatest adventure of all time.
In the year I was born most of the world’s people suffered under totalitarian rule,
judges could and did order electroshock therapy to “cure” homosexuals,
people could legally be denied jobs because of their skin color,
and women couldn’t open bank accounts without their husband’s permission.
Yes,
a lot of things are bad and/or getting worse,
but look at how far we’ve come.
Look at how many more choices you have than your grandparents did.
Look at how many more things you can know,
and be,
and enjoy.
And most importantly,
look at how many other people can too.
That didn’t happen by chance.
Every time you buy one brand of running shoe rather than another
or take a minutes to vote
you are choosing one future over another.
Every time you help someone do something they couldn’t do before,
you are giving them more more control over their own life.
The world doesn’t get better on its own.
It gets better because we make it better:
penny by penny,
vote by vote,
and one lesson at a time.
The climate crisis, mass extinction, surveillance capitalism,
inequality on a scale we haven’t seen in a century,
the re-emergence of racist nationalism:
my generation could have prevented it,
but decided that quarterly earnings were more important.
The bills for our cowardice, lethargy, and greed are now coming due;
as they do,
we have left you no easy solutions to these problems.
That doesn’t mean there are no solutions at all, though.
The essays that follow will explore a few things I wish I had known earlier:
where power comes from,
how it is used,
how its use is hidden,
and how people have held the powerful accountable and made the world a fairer place.
I’m not going to try to be comprehensive or even-handed,
but I hope you’ll find it informative, entertaining, and inspiring.
In order to understand how the world works,
we have to understand how people think.
That’s a tall order,
so the sections below focus on a few things that I’ve found particularly useful.
People Don’t Maximize Utility
In 1971,
Daniel Kahneman and Amos Tversky ran a simple experiment.
They told participants that a disease was expected to kill 600 people
and asked them to choose between two public health programs.
Program A would save exactly 200 people.
Program B had a one-in-three chance of saving all 600 and a two-in-three chance of saving none.
Most people chose A, i.e., they preferred the certain outcome.
Then Kahneman and Tversky rephrased the choice.
Program C would result in exactly 400 deaths.
Program D had a one-in-three chance that nobody would die and a two-in-three chance that all 600 would die.
Statistically, the two programs are identical,
but this time, most people chose D.
Nothing changed except how the outcomes were described.
Classical economics assumes that people are rational agents who consistently maximize their own utility.
When faced with a choice they weigh expected outcomes,
discount the future at a consistent rate,
and select whatever serves them best.
This has been proven false over and over again,
but persists because—well, we’ll get to that later in this essay.
Behavioral economics looks at how people actually make decisions,
and has repeatedly shown that they deviate from “rational” in predictable ways.
The first problem with the rational-agent model is computational.
Optimizing requires evaluating all possible options against all possible outcomes under all possible conditions.
No one can actually do this,
so instead,
people use a strategy that Herbert Simon called satisficing:
they search through available options until they find one that is good enough and then stop.
Herbert Simon called this bounded rationality:
people are rational within the limits of the information, time, and cognitive capacity they actually have,
which makes the heuristics people use to make decisions worth studying.
Kahneman and Tversky spent decades cataloguing people’s heuristics
and the cognitive biases they embody.
Anchoring is one of the most reliably reproduced findings in all of psychology.
When people estimate an unknown quantity,
their estimates are heavily influenced by numbers they have recently encountered,
even ones they know to be irrelevant.
In one study,
participants spun a wheel rigged to land on either 10 or 65,
then estimated the percentage of African countries in the United Nations.
Those who had seen 65 guessed about 45 percent higher than those who had seen 10.
They knew the wheel was random,
but the number shaped their thinking anyway.
This isn’t stupidity or laziness;
it is the brain doing something that is sensible in most contexts.
Nearby numbers are usually informative,
so most of the time, it makes sense to rely on them.
This is why prosecutors set high anchor charges:
juries’ verdicts cluster around the opening number.
It is also why retailers display high “original” prices:
customers anchor to whatever is crossed out.
And research on salary negotiation consistently shows that
the person who names the first number has the advantage,
which is why negotiating advice boils down to the same instruction:
speak first.
The availability heuristic says that
people estimate how likely something is by how easily they can think of examples.
After a plane crash receives extensive media coverage,
people overestimate the risk of flying and underestimate the risk of driving,
even though the underlying statistics have not changed.
The availability heuristic is why catastrophic but rare events dominate public attention
while slow, diffuse harms are systematically underestimated,
which in turn is why it took decades to build public pressure
around tobacco, lead paint, and vehicle safety [Kahneman2011].
Prospect theory describes how people actually evaluate outcomes.
The key finding is loss aversion:
a loss of a given size produces roughly twice the emotional impact of an equivalent gain.
This asymmetry has practical consequences wherever people have a reference point they are trying to protect.
Studies of taxi drivers in New York, Singapore, and other cities show that
drivers work longer hours on bad days when earnings are below their daily target,
and knock off early on good days.
A rational agent who cares about total earnings would do the opposite,
working more hours when conditions are favorable and fewer when they are not.
Instead,
drivers are managing losses relative to a reference point,
not maximizing total income.
Similarly,
the standard model predicts consistent discounting:
a reward next month should be worth a fixed percentage less than the same reward today,
and the same percentage should apply to any two adjacent future periods.
What people actually show is hyperbolic discounting:
an extremely steep preference for the present relative to any future point,
combined with much flatter preferences among future periods.
This is why someone can genuinely plan to quit smoking next year
while lighting a cigarette.
It is why gym memberships are purchased with full intention and then rarely used.
Our future selves are strangers, and we are generous to ourselves and stingy with strangers.
If small changes in how choices are presented can have large effects on behavior,
then choice architecture—the deliberate design
of decision environments—is itself a policy tool.
Thaler and Sunstein called one form of it nudging.
The canonical example is pension enrollment.
When workers must actively opt in to a pension plan,
participation rates are typically around 50 to 60 percent.
When workers are enrolled unless they actively opt out,
participation rises to 80 to 90 percent,
without any change to the financial terms.
The UK government introduced automatic pension enrollment in 2012;
by 2019,
over ten million additional workers had joined workplace pensions as a direct result.
The UK’s Behavioural Insights Team, established in 2010,
found that adding “Nine out of ten people in your area pay their taxes on time”
to letters sent to late tax payers increased on-time payment rates by several percentage points.
The intervention cost essentially nothing and recovered tens of millions of pounds in additional revenue.
Nudges like thiat are not manipulation in the obvious sense—nothing is hidden and no options are removed.
But the line between a nudge and a shove depends entirely on whose interests the design serves.
Automatic enrollment in a pension plan serves the worker.
Automatic enrollment in a subscription that is difficult to cancel serves the company.
Variable reward schedules designed to maximize platform engagement
are also nudges, built on the same science, serving a different master.
Every infinite scroll, every notification badge,
every “people who liked this also liked” recommendation
is a behavioral economics intervention.
The field that began by documenting human irrationality
has become the primary toolkit for industrializing it [Thaler2009].
People Care About Fairness
Imagine you are given ten dollars to split with a stranger.
You can offer them any amount you like.
If they accept, you both keep your shares,
but if they reject the offer,
neither of you gets anything.
A purely self-interested stranger, according to classical economics,
should accept any positive offer—even one dollar—because one dollar is better than nothing.
When researchers ran this experiment across dozens of countries,
they found that offers below thirty percent of the total were rejected roughly half the time:
people would rather walk away with nothing
than accept an outcome they perceived as unfair.
In some communities, rejection rates were even higher.
This experiment, called the ultimatum game,
has been run so many times and reproduced so reliably
that its basic finding is no longer seriously contested.
People care about fairness,
punish violations of it at cost to themselves,
and do so even with strangers they will never see again.
This directly contradicts the assumptions that underlie most of modern economics,
much conservative political thought,
and a substantial proportion of technology design.
The idea that human beings are fundamentally self-interested did not emerge from evidence.
It emerged from political argument,
was later dressed in mathematical formalism,
and eventually achieved the status of dogma.
Thomas Hobbes, writing in 1651,
described the natural condition of humanity as “a war of all against all.”
Life without government, he argued, was “solitary, poor, nasty, brutish, and short.”
Hobbes wasn’t reporting on anthropology—he was making a political case for sovereign authority.
If humans are naturally predatory,
then the powerful state he wanted is the only alternative to chaos.
Two centuries later,
Herbert Spencer read Charles Darwin’s account of natural selection
and announced that it confirmed what Hobbes had suspected.
“Survival of the fittest” is Spencer’s phrase, not Darwin’s.
Darwin described differential reproductive success;
Spencer described a cosmic competition for dominance
in which helping the weak was a biological error.
Social Darwinism,
as this cluster of ideas became known,
provided intellectual cover for opposing labor rights and public health measures,
and for fighting almost any intervention that might protect people from market outcomes.
After all,
if the weak lost,
it was because Nature intended them to lose.
By the mid-twentieth century,
economists had turned these self-serving rationalizations into mathematics.
Homo economicus was a rational agent who consistently and accurately maximized his own utility.
He did not care about fairness or make mistakes;
he also didn’t care about other people’s welfare unless it affected his own.
(And yes, I’m using the male pronoun deliberately.)
The behavioral economics research described in an earlier post explains how this is fiction,
but the model is not just wrong about how people think,
but about what they want.
The ultimatum game is one of dozens of experiments
that have been used to study human social preferences across cultures.
Public goods games ask participants to contribute to a shared fund that pays out to everyone,
including those who contribute nothing.
Standard economic theory predicts that rational individuals will free-ride
(i.e., contribute nothing while collecting their share of others’ contributions)
until the fund collapses.
In practice, initial contribution rates are typically between forty and sixty percent,
and when participants can identify and punish free-riders,
contribution rates rise and stay high.
People punish free-riders even when punishment costs them something.
What’s more,
they do it in one-shot interactions where there is no future reputation at stake.
Samuel Bowles and Herbert Gintis spent years synthesizing this evidence,
arguing that humans evolved not just as individuals competing for resources
but as groups competing against other groups.
Cooperation within groups enforced by altruistic punishment of defectors was
a successful evolutionary strategy.
The capacity for that cooperation,
and the emotional responses that sustain it like fairness, shame, and indignation,
are as deeply embedded in human nature
as any appetite for self-interest [Bowles2011,Bregman2020].
None of this means humans are angels.
Self-interested behavior is real.
But so is cooperation, fairness, and punishment of norm violation.
The question is which tendencies a given institutional design tends to elicit.
The Commons is Not a Tragedy
In 1968, the ecologist Garrett Hardin published an essay
in which he described a common pasture open to all herders.
Each one, acting rationally in their own interest,
would add animals to the pasture until it was destroyed.
The gains from each additional animal went to the individual herder,
but the costs of the degraded pasture were borne by all.
Self-interest would, inevitably, exhaust the commons.
The only solutions Hardin could see were privatization or state regulation.
“The Tragedy of the Commons” became one of the most cited papers in academic history.
It appeared in economics, political science, environmental policy, and law.
Its intellectual framework shaped fisheries policy, water rights law,
and debates about global climate governance [Hardin1968].
There was just one problem:
Hardin hadn’t studied any actual commons.
He had described an unmanaged commons with no rules, no governance, and no community.
The historically managed commons of medieval England,
the Alpine meadows of Switzerland,
the forest communities of Japan,
and the irrigation systems of Valencia and Bali all had elaborate rules developed over generations,
mechanisms for monitoring compliance,
graduated sanctions for violations,
and processes for resolving disputes.
They had been managing shared resources sustainably, in some cases, for centuries.
The real tragedy in Hardin’s work was his ignorance of how the real world actually worked.
In contrast,
the political scientist Elinor Ostrom spent her career studying actual systems.
The picture that emerged was not a tragedy,
but a sophisticated diversity of institutions,
each one adapted to local conditions,
and each one solving the collective action problem
that Hardin had assumed was unsolvable without markets or states.
In Governing the Commons,
Ostrom identified eight design principles that successful self-governing commons tend to share:
Members have clearly defined rights to the resource.
Rules are adapted to local conditions rather than imposed from outside.
People affected by the rules have meaningful input into changing them.
A system exists for monitoring both the resource and the behavior of users.
Sanctions are graduated—minor violations draw minor consequences.
Conflicts can be resolved quickly and cheaply.
External authorities recognize the community’s right to self-organize.
Larger systems are built from nested smaller ones.
In 2009, Ostrom was awarded the Nobel Prize in Economics for her work.
The prize committee cited her demonstration that
“economic analysis can shed light on most forms of social organization.”
What her work actually demonstrated was narrower and more radical than that:
that communities could govern shared resources sustainably
without either privatizing them or handing them to the state,
and that the dominant theoretical model had failed to predict this
because it had assumed the wrong things about human nature.
If the evidence against homo economicus is this extensive,
why does the model retain such a hold on policy and institutional design?
Part of the answer is that the model is self-fulfilling in a useful way.
If you design a system that assumes people will free-ride,
you build in monitoring, penalties, and enforcement mechanisms.
Those mechanisms signal distrust,
which tends to erode the social norms that sustained voluntary cooperation.
People who might have contributed voluntarily
now respond to being treated as suspects.
The system that assumed selfishness produces the selfishness it expected.
In contrast,
Ostrom’s communities worked partly because the institutions expressed trust:
users had a voice in the rules,
sanctions were proportionate rather than punitive,
and the system treated people as members of a community rather than as threats to be managed.
Technology platforms have largely chosen the other path.
Terms of service are written for adversaries.
Moderation systems treat all users as potential bad actors.
Engagement optimization assumes that appetites can be exploited.
These choices reflect a theory of human nature,
and that theory has consequences—not just for the products built on it,
but for the kind of behavior those products elicit and reward.
Ostrom’s lesson is not that humans always cooperate.
It is that cooperation is a realistic outcome
if systems are designed to support it,
and that assuming the worst tends to prevent the better from occurring.
The tragedy of the commons was not inevitable.
It was what happened when community governance was absent.
Building that governance, it turns out, is something humans are rather good at,
so long as institutions give them room to try [Ostrom2015].
People Care About Appearances
In 2001, the Norwegian government made its tax records publicly searchable online,
so that every citizen could now look up what any other citizen earned.
This was not entirely new—the country’s tax data had theoretically public for years—but
the internet made it frictionless.
Journalists could now scrub entire neighborhoods,
neighbors could check each other out,
and colleagues could compare their salaries with one another’s.
Ricardo Perez-Truglia used this moment as a natural experiment [PerezTruglia2020].
He tracked self-reported well-being before and after the records went online
and found that the gap between higher- and lower-income Norwegians widened by 29%.
Absolute incomes did not change;
what did was knowing how you compared to other people.
This is the central finding of research on social standing:
what people care about is not how much they have in absolute terms,
but where they stand relative to those around them.
It explains a long list of behaviors that seem irrational under standard economic assumptions.
Thorstein Veblen noticed this in 1899,
before there were smartphones or social media
(or economists to argue with his heresy).
The Theory of the Leisure Class
introduced the term conspicuous consumption
to describe spending whose primary purpose is to signal social rank [Veblen1899].
His key insight is that the signal only works if it is costly:
something that only the wealthy can afford communicates rank precisely because of its price.
Similarly,
in a world where most people have to do physical labor,
conspicuous leisure is only possible for the rich.
As leisure became more broadly available,
the signal shifted:
today,
being seen to be overworked and constantly in demand signals high status:
the business traveler at the airport in the expensive suit checking email at midnight
is the modern equivalent of the nineteenth-century aristocrat who demonstrably never lifted anything heavy.
“Being seen” may be the most important part of the previous sentence.
Invisible labor like housework,
mentoring junior colleagues,
or smothering your feelings for the benefit of others
has lower status.
It is usually dumped on women, members of minoritized groups, and the economically disadvantaged,
which creates a vicious circle.
Veblen pointed out that status competition is structurally self-defeating.
If I buy a larger house to signal rank and my neighbors respond by buying larger houses,
we have all spent money and all returned to the same relative position.
The competition is real but the gains are illusory;
the spending continues because the first person to stop stops loses ground to those who don’t.
Robert Frank built on Veblen’s work
with a careful study of wage patterns within firms [Frank1985].
Standard economics predicts that workers will always move toward higher absolute pay:
if they can earn more elsewhere, they will go elsewhere.
Frank found that this prediction fails systematically.
Workers at the bottom of a firm’s pay distribution are paid above their marginal productive value,
while workers at the top are paid below it.
The spread is not random:
it is consistent with workers accepting lower total pay in exchange for higher rank within their peer group.
The implication is that a programmer who is the highest-paid person on a small team
may prefer that position to being a lower-ranked member of a higher-paying team,
even if the absolute salary differential favors the larger team.
This is not irrationality:
rank confers real benefits,
so trading some income for rank is a sensible exchange.
Standard economics fails to predict the trade only because it refuses to count rank as a good.
Frank’s local-rank argument helps explain the consistent finding in salary surveys
that the highest correlate of worker satisfaction is not absolute pay
but pay relative to colleagues doing similar work.
Across many countries and industries,
fairness within the reference group matters more than the number itself.
Fred Hirsch introduced the concept of positional goods,
whose value depends on how many other people have it [Hirsch2015].
A house with an ocean view is a positional good:
if everyone had a house with an ocean view, the view would cease to confer distinction.
A senior job title,
a degree from a prestigious school,
or a table at an exclusive restaurant are all examples.
Hirsch pointed out that positional goods cannot be democratized.
Refrigerators and mobile phones can eventually be afford by almost everyone,
and everyone genuinely benefits.
Positional goods cannot work this way.
For example,
if a prestigious university expands admissions to let in everyone who wants to attend,
its value signal collapses.
This is precisely what has happened with university degrees in wealthy countries since the 1960s.
When only a small fraction of the population held degrees,
a degree signaled something.
As participation rates rose from 5 percent to 50 percent,
the same degree began to signal much less,
so the game shifted to which university,
then to postgraduate qualifications,
then to increasingly specific institutional prestige.
Each generation has to spend more to achieve the same relative position as the previous one.
This is not a problem that can be solved by making university cheaper or more accessible:
that simply changes the positional good everyone is competing for.
The empirical case that rank rather than income drives well-being
has been built up over two decades.
An analysis of the British Household Panel Survey,
which tracked thousands of households over many years,
found that once income rank was included in the model,
absolute income had no statistically significant effect on life satisfaction [Boyce2010].
What predicted whether someone was satisfied with their life was
where they stood compared to their peers.
Wilkinson and Pickett extended this argument at the national level
with evidence that more unequal societies perform worse on almost every social indicator,
regardless of their average wealth [Wilkinson2011].
More equal societies have lower rates of homicide, mental illness, obesity, teenage pregnancy, and imprisonment.
They have higher rates of trust, social mobility, and life expectancy.
This pattern holds across wealthy countries:
the United States, the United Kingdom, and Portugal,
which are among the most unequal wealthy nations, perform poorly;
Japan, the Nordic countries, and the Netherlands, which are among the most equal, perform well.
The causal mechanism is status anxiety:
higher inequality creates steeper hierarchies,
which produce more corrosive competition for rank.
Which brings us to social media.
Before digital platforms, status competition based primarily on physical proximity:
you compared yourself to your neighbors, colleagues, and relatives.
Platforms have replaced that bounded reference group with a global feed
curated by algorithms optimized for engagement rather than accurate representation.
The comparison you are now offered is not with your actual neighbors.
It is with the most aspirational version of everyone you have ever met.
The result in South Korea, India, the UK, and Brazil
is an intensification of status anxiety without any corresponding change in absolute circumstances.
Someone whose life is objectively comfortable
can be made to feel inadequate by a platform that continuously serves them evidence
that other people are more attractive, a better parent, or has traveled more widely.
Social media platforms did not create the desire for status.
What they did was put that desire on a subscription model,
charge advertisers to place products in the resulting stream of anxiety,
and call the resulting business a social network.
It’s a game that only they can win.
Corporations are Psychopaths
In October 2011,
Michael Woodford received what should have been the best news of his career.
After thirty years working at Olympus,
the Japanese optics and medical equipment company,
he had been made chief executive:
the first non-Japanese person to run the company in its history.
Six months later, he was fired.
Woodford had found something wrong with a series of acquisitions the company had made.
The amounts paid were enormous,
the assets were nearly worthless,
and the accounting explanations made no sense.
He hired KPMG to look into it and took the resulting report to the chairman,
Tsuyoshi Kikukawa.
Kikukawa’s response was to call an emergency board meeting and vote Woodford out.
Woodford went public.
The Olympus board denied everything for a few weeks,
but then the numbers collapsed, Kikukawa resigned, and criminal charges followed.
The fraud—a sustained effort to conceal $1.7 billion in losses—had been running for nearly twenty years,
through the tenure of multiple CEOs.
What is notable about the Olympus scandal is not that individuals behaved dishonestly.
It is that they weren’t necessarily bad people.
It was as if the organization had developed a mind of its own,
and successive leaders served it rather than the other way around.
Three years before the Olympus scandal,
the Canadian legal scholar Joel Bakan asked Robert Hare,
the psychologist who had spent four decades developing
the clinical tools used to diagnose psychopathy,
to evaluate a publicly traded corporation against his Psychopathy Checklist
as if the corporation were a person [Bakan2005].
The checklist was designed to identify individuals who are:
callously indifferent to harm they cause to others,
skilled at charming and manipulating people around them,
incapable of genuine guilt or remorse,
unwilling to accept responsibility for their own failures, and
willing to lie and deceive when they believe they can get away with it.
Hare’s conclusion was that publicly traded corporations fit the profile.
In most jurisdictions,
corporate executives have a fiduciary duty to shareholders:
they are legally required to pursue shareholder interest,
and a board that sacrificed profit to benefit workers or communities
with no defensible business justification
could be held legally liable.
The resulting entity is therefore prohibited from having a conscience
in the way an individual person might.
It can behave ethically when ethics is good for the brand,
but not when the cost cannot be justified by future returns.
None of this requires any individual inside the organization to be a bad person.
It requires only that the rules governing the organization create incentives
that produce a certain kind of behavior.
However,
this argument becomes harder to sustain
when you look at who rises to the top of large organizations.
In 2005,
Belinda Board and Katarina Fritzon surveyed 39 senior managers and executives in the United Kingdom
and compared their psychological profiles to a matched group of patients at Broadmoor,
a high-security psychiatric hospital [Board2005].
The executives scored higher than the Broadmoor sample on three personality disorder traits:
histrionic, narcissistic, and compulsive.
The researchers called this pattern “successful psychopathy”:
the traits that lead to hospitalization or criminal conviction in their extreme form are,
in a milder and better-managed form,
associated with reaching senior management.
Paul Babiak and Robert Hare spent years studying
how psychopathic individuals navigate organizational environments [Babiak2019].
Their estimate is that
roughly one percent of the general population meets the clinical threshold for psychopathy,
while corporate managers cluster around three to four times that rate.
The mechanism is not mysterious.
Psychopaths tend to perform exceptionally well in job interviews.
They are confident, articulate, and skilled at saying what an interviewer wants to hear.
They feel no social anxiety in high-stakes situations,
and can fabricate credentials and relationships convincingly
because they feel no guilt about doing so.
Once hired,
they are evaluated primarily on how they appear to those above them in the organization,
and making a strong impression on a small number of people across a limited number of interactions
is something psychopaths do better than almost anyone else.
This is where a closely related concept becomes important:
impression management,
a term introduced by Erving Goffman in 1959 [Goffman1959].
His observation was that social life is fundamentally theatrical:
people perform different versions of themselves for different audiences,
and success in social situations depends heavily on managing those performances.
In a small organization where everyone works closely together over years,
this has limited scope because your actual behavior is too visible.
Colleagues know when you take credit for other people’s work,
when your confident predictions turn out wrong,
and when your charm disappears because you no longer need something from someone.
In a corporation with thousands of employees,
on the other hand,
promotions are typically decided by people who have less direct contact with the person in question.
They evaluate based on presentations, meetings, secondhand reports,
and the impressions formed in a relatively small number of interactions.
This is precisely the environment where impression management skills are most valuable,
and where the gap between managing impressions and actually performing well is hardest to detect.
Researchers who study the dark triad of psychopathy, narcissism, and Machiavellianism
have consistently found that individuals high in these traits do particularly well
in the early and middle stages of corporate careers.
Narcissists project confidence and vision.
Machiavellians are skilled at reading and exploiting organizational dynamics.
Psychopaths can absorb stress,
make decisions that harm others without losing sleep,
and deliver bad news without visible discomfort.
Each of these is a behavior that,
in moderation and over short time horizons,
looks like leadership.
Failure only occurs when a crisis requires something the dark triad cannot supply:
integrity,
honest self-criticism,
or concern for people the leader does not need.
Dutton’s research on which professions attract the most psychopaths put CEO at the top of the list,
followed by lawyer, media professional, salesperson, and surgeon [Dutton2013].
What these jobs share is a combination of high stakes,
limited direct accountability,
and the need to remain calm under pressure,
which are precisely the traits psychopaths happen to have.
Wirecard was a German payments company whose rise was celebrated as a European technology success story.
By 2018 it had joined the DAX, Germany’s index of its thirty largest listed companies.
Its chief executive,
Markus Braun,
appeared at industry conferences as the model of a visionary, unflappable founder.
When the Financial Times published articles suggesting that
large portions of the company’s claimed revenue did not exist,
Germany’s financial regulator filed a criminal complaint against the journalist who wrote them.
When the fraud collapsed in 2020,
€1.9 billion turned out never to have existed.
Braun had not built a company.
He had built an extremely convincing impression of one.
South Korea’s chaebol are a structural variation on the same theme.
The heads of Samsung, Lotte, SK, and others
have faced criminal convictions for bribery and embezzlement—and received presidential pardons,
typically on the grounds that their imprisonment would harm the national economy.
This pattern of prosecution followed by pardon describes an organization
that has achieved something psychopathic at the institutional level:
the normal consequences of harmful behavior have been suspended
because the organization is too important to be held accountable.
None of this means that every large company is led by psychopaths,
or that organizational scale inevitably produces moral failure.
It means that the selection pressures of large hierarchies are not neutral.
Hiring processes that rely heavily on interviews systematically favor candidates who are good at interviews.
Promotion decisions made by people with limited direct observation
systematically favor candidates who are good at being observed,
and performance reviews based on self-assessment systematically favor
candidates who think highly of themselves.
These processes aren’t designed to select for the dark triad,
but they are all structured in ways that make dark triad traits an advantage.
The Olympus fraud ran for twenty years
because each successive layer of management found it easier to maintain the deception than to stop it.
No individual needed to be a psychopath;
the organization’s incentives reproduced psychopathic behavior regardless of who ran it.
Bad people come and go;
structures that reward bad behavior reproduce themselves.
Why Don’t People Just Say No?
In 2016, Wells Fargo fired 5300 employees for opening millions of fake accounts in customers’ names
without their knowledge.
These were not executives:
they were branch staff, customer service representatives, and personal bankers.
When the scandal became public, initial coverage framed it as individual misconduct.
The problem with that framing was that number “5300”.
You cannot explain mass participation through individual bad character.
Instead, you have to ask
what conditions cause thousands of ordinary people to do something they know is wrong.
Stanley Milgram started asking this question in the early 1960s,
partly in response to the trial of the Nazi Adolf Eichmann,
who had organized the logistics of the Holocaust [Milgram1974].
Eichmann’s defense was that he had simply followed orders.
Hannah Arendt,
covering the trial for the New Yorker, coined the phrase that has not since been improved on:
the “banality of evil.”
Her point was that Eichmann was not a monster;
he was a bureaucrat doing what his organization told him he was supposed to do [Arendt2006].
Milgram wanted to test how far ordinary people would go.
In his experiments, volunteers were told they were measuring the effect of punishment on learning.
An actor in another room pretended to receive electric shocks when giving wrong answers,
and subjects were instructed to increase the voltage with each error.
Most continued well past the point where the actor was screaming and, eventually, silent.
Two thirds of subjects administered what they believed were the maximum possible shocks.
(When the authority was absent or instructions were given by phone,
compliance dropped sharply.)
Milgram’s subjects were not sadists:
they were people responding to the combination of an authority figure,
a legitimate-seeming purpose,
and gradual escalation.
Corporate hierarchies reproduce all of these conditions.
The Wells Fargo employees were never actually instructed to defraud customers.
They were given sales quotas that were mathematically impossible to meet through legitimate means,
put under daily supervision,
and subjected to a culture in which the phrase “eight is great”
(i.e., eight accounts per customer)
was a daily mantra.
Each individual decision was small enough to feel manageable;
employees who raised concerns were sometimes fired,
and the outcome was fraud on a massive scale.
Albert Bandura spent decades studying what he called moral disengagement:
the psychological mechanisms by which people participate in harmful behavior
without experiencing it as harmful.
These include displacement of responsibility (“I was just following orders”),
diffusion of responsibility (“everyone else was doing it”),
euphemistic labeling (calling fake accounts “cross-selling solutions”),
and treating the people being harmed as abstractions rather than as individuals.
Bandura’s insight is that these mechanisms are not
rationalizations invented after the fact [Bandura1999].
They are available in advance,
and organizations learn to activate them.
When tech companies describe user data as “exhaust”,
call manipulative design patterns “engagement optimization”,
or frame advertising surveillance as “connecting people with relevant products”,
they are providing workers with the vocabulary of moral disengagement they need
to get them to do morally repugnant things [Palazzo2025].
The pattern repeats across industries and cultures.
Mitsubishi Motors covered up safety defects for over two decades,
with participation from engineers, quality controllers, and managers who each knew parts of the problem.
And then there’s Volkswagen’s “Dieselgate” scandal,
which became public in 2015 after researchers at West Virginia University found that
cars on the road emitted far more nitrogen oxides than official test results suggested [Ewing2017].
Engineers had written software that detected when a vehicle was undergoing an emissions test
and activated pollution controls that were switched off during normal driving.
Around eleven million cars worldwide contained this code,
across multiple model lines and product generations.
The engineers who wrote it had to design, maintain, and extend the software for years,
which required knowing exactly what it was for.
What changes the equation?
Milgram found that compliance fell dramatically when subjects could see another person refuse to continue.
One dissenting voice—a confederate planted among the subjects—was enough to break the spell,
which shows that the social proof of refusal is as powerful as the social proof of compliance.
The implication is not that everyone needs to be a hero.
It is that if you want ethical behavior from an organization,
you only need a few visible dissenters.
In October 2021,
Frances Haugen,
a former Facebook product manager,
testified before the US Senate Commerce Committee
with copies of thousands of internal company documents
she had secretly copied before resigning [Frenkel2021].
The Facebook Papers showed that
Facebook’s own researchers had found Instagram was worsening mental health among teenage girls,
that the platform’s recommendation algorithms amplified political outrage,
and that the company had repeatedly chosen not to act on these findings
when acting would have reduced engagement.
Haugen’s testimony prompted legislative proposals in several countries,
but produced no significant change to Facebook’s practices.
Facts Alone Don’t Change Minds
Most scientists and programmers’ implicit model of belief is roughly Bayesian:
when someone who believes something about the world receives new evidence,
they update their beliefs in the way that fits that evidence best.
This model is mostly true in domains that people aren’t emotionally invested in,
but fails in predictable ways for beliefs that are tied to group membership.
Research in social psychology has established that
beliefs about contested political and social issues
function primarily as signals of group identity
rather than as conclusions from evidence [SteinLubrano2024].
Holding the wrong belief does not just mean being misinformed:
it puts you outside the group,
so updating the belief means leaving that group.
The social cost of updating is therefore often higher than
the mental cost of staying wrong.
This is why the tobacco industry’s manufactured uncertainty was so effective:
it did not need to be persuasive on the merits.
It only needed to give people with strong social reasons not to update
a plausible excuse to stay where they were.
Motivated reasoning compounds this.
People do not evaluate arguments neutrally.
They are significantly better at identifying flaws
in arguments that lead to conclusions they dislike
than in arguments that lead to conclusions they support.
A trained scientist who is also a gun owner
will scrutinize studies on gun violence
more skeptically than studies on climate change
if their social circle treats the former as identity-threatening but not the latter.
This isn’t dishonesty in the ordinary sense because the person doesn’t know they’re doing it.
The biased scrutiny is real scrutiny;
the flaws they find are often genuine.
But the asymmetric attention means
they have reached a conclusion before their evaluation begins [Kahneman2011].
A colleague once told me that people want data, but believe stories.
This makes sense in light of motivated reasoning:
data provides cover for a decision already reached on other grounds,
while stories transmit the emotional and social context that actually drives belief change.
It is also why the most effective public health campaigns don’t focus on presenting statistics;
they present specific, named people in specific situations.
This rule is also why
industries that want to prevent people from changing their beliefs
fund think tanks that produce reports full of data.
The data itself is not meant to influence people:
instead,
they count on the appearance of rigor,
which mimics legitimate evidence without its substance.
The fossil fuel industry has been running this operation for thirty years,
and tech companies are now doing it as well.
If we want AI, social media, and the software industry in general to be regulated in meaningful ways,
finding and presenting evidence of harm won’t be enough.
We need to change the social context in which beliefs are held
by finding trusted messengers within the relevant communities,
reframing the issue so that updating does not require identity betrayal,
and working through social networks rather than through arguments.
This is not manipulation:
it is taking the psychology of belief seriously [Achen2017,Hoffer2010].
High-priority jobs (class H) arrive frequently and are served quickly.
Low-priority jobs (class L) arrive rarely and take longer to serve.
The server always picks the highest-priority job available. Total server utilization $\rho = \rho_H + \rho_L < 1$, so the server has spare capacity on average. Yet low-priority jobs can wait far longer than the utilization level suggests they should.
Static Priority: Starvation at Moderate Load
With a static priority queue, high-priority jobs never yield to low-priority ones. Even when $\rho_H < 1$, high-priority bursts can lock out low-priority jobs for extended periods. The mean wait for low-priority jobs under a static non-preemptive priority queue is:
This diverges as $\rho_H \to 1$ independently of $\rho_L$. As $\rho_H$ approaches 100%, low-priority jobs wait arbitrarily long, even if only a few low-priority jobs ever arrive.
Aging: Solving Starvation Creates Oscillation
The standard remedy for starvation is priority aging: a waiting job’s priority improves over time until it eventually beats even high-priority arrivals. This guarantees finite wait for all jobs.
However, aging introduces a new pathology. When aged low-priority jobs finally burst through, they occupy the server and leave a backlog of high-priority jobs waiting. The high-priority queue then drains, and the cycle repeats — producing oscillating bursts rather than smooth, uniform service.
What aging does
Aging assigns each waiting L job a maximum patience time $T_{\max}$. After waiting $T_{\max}$, the job is promoted to high priority. This caps the worst-case wait: no L job can wait longer than $T_{\max}$ plus one service time.
Practical Implications
Priority queues appear throughout computing:
OS scheduling: interactive processes (high priority) vs. batch jobs (low priority). Linux uses dynamic priority aging (nice values + sleep bonuses) to avoid starvation.
Network QoS: real-time traffic (VoIP, video) vs. bulk data. Traffic shaping with Deficit Round Robin (DRR) or Weighted Fair Queuing (WFQ) guarantees bandwidth shares without starvation.
Database query planning: short OLTP queries vs. long OLAP queries. Resource groups and query timeouts implement a form of aging.
Understanding the Math
Mean wait for two-priority queues
Let $\lambda_i$, $\mu_i$, and $\rho_i = \lambda_i / \mu_i$ be the arrival rate, service rate, and utilization of class $i \in {H, L}$. For a non-preemptive priority queue:
where $R_0 = \tfrac{1}{2}(\lambda_H \overline{s_H^2} + \lambda_L \overline{s_L^2})$ is the mean residual work seen by an arriving customer. The ratio $W_L / W_H = 1/(1 - \rho_H)$ grows without bound as $\rho_H \to 1$.
Why “on average” is not enough
Even when $\rho < 1$, randomness creates bursts of H arrivals. During a burst, the server is continuously occupied by H jobs, and L jobs must wait in the background. The mean wait for low-priority jobs is:
The critical factor is $(1 - \rho_H)$ in the denominator. As $\rho_H \to 1$, this factor approaches zero and $W_L \to \infty$ — even if $\rho_L$ stays small and the total load $\rho$ is comfortably below 1.
The trade-off
Without aging, $W_L$ can be infinite when $\rho_H$ is large. With aging, $W_L \leq T_{\max} + 1/\mu_L$, but during promotion events the effective $\rho_H$ spikes temporarily, increasing $W_H$. Choosing $T_{\max}$ is a design decision: a small $T_{\max}$ protects L jobs but forces more promotions and penalizes H jobs more often; a large $T_{\max}$ is kinder to H jobs but allows L jobs to wait longer. There is no setting that simultaneously minimizes both — the trade-off is fundamental.
This article was originally written for marimo.io.
A city has two routes from source $S$ to destination $T$:
Top route $S \to A \to T$: link $SA$ is congestion-dependent; link $AT$ has a fixed travel time.
Bottom route $S \to B \to T$: link $SB$ has a fixed travel time; link $BT$ is congestion-dependent.
The network is symmetric. A city planner proposes adding a new shortcut link $A \to B$ with near-zero travel time, creating a third route $S \to A \to B \to T$. To her surprise, adding the shortcut makes everyone’s travel time longer at the selfish-routing Nash equilibrium.
Without the shortcut
Both routes are symmetric. In equilibrium, traffic splits evenly. If $N/2$ drivers use each route and the congested links have delay $\alpha \cdot n$ (where $n$ is the number of cars):
$$t_{\text{top}} = \frac{N}{2}\alpha + c = t_{\text{bottom}}$$
With the shortcut $A \to B$
Each driver thinks, “Link $AB$ is free; I can use $SA$, slip across to $B$, then take $BT$ instead of the slow constant link $AT$.” All $N$ drivers make this choice. The Nash equilibrium has everyone on $S \to A \to B \to T$:
Since $2N\alpha > \frac{N}{2}\alpha + c$ for typical parameters, travel times increase after the road is added. This is the paradox: individually rational decisions produce a collectively worse outcome. The ratio of Nash equilibrium cost to the socially optimal cost is called the price of anarchy.
Braess’s paradox is not theoretical. Seoul, Stuttgart, and New York all observed traffic improvements after closing roads. Conversely, new roads in highly congested networks have sometimes worsened average travel times.
Understanding the Math
Nash equilibrium
A Nash equilibrium is a situation where every player has chosen a strategy and no single player can improve their own outcome by switching to a different strategy so long as everyone else stays put. Think of it as a stable fixed point: if you woke up one morning in a Nash equilibrium, you would have no reason to change what you are doing. Crucially, a Nash equilibrium need not be the best possible outcome for everyone collectively.
The paradox, step by step
Label the number of cars $N$ and suppose the congested links have delay $\alpha \cdot n$ where $n$ is the number of cars currently using that link. Without the shortcut, traffic splits evenly: $N/2$ cars use each route. Each driver’s travel time is $(N/2)\alpha + c$, where $c$ is the fixed delay on the non-congested link. Neither route is faster than the other, so no driver wants to switch — that is Nash equilibrium.
Now add the shortcut $A \to B$ with near-zero travel time $\varepsilon$. A single driver considering a switch reasons: “Link $AB$ is essentially free. If I take $SA$, cross to $B$, and take $BT$, I avoid the fixed cost $c$.” If that driver is the only one to switch, it looks cheaper. But every driver makes the same calculation simultaneously. At the new equilibrium, all $N$ drivers pile onto $SA$ and $BT$:
Since $2N\alpha > (N/2)\alpha + c$ for typical parameters, everyone is worse off than before the shortcut was built.
The price of anarchy
The social optimum would split traffic evenly at cost $(N/2)\alpha + c$, but selfish routing delivers $2N\alpha + \varepsilon$. The price of anarchy exceeds 1, meaning individual rationality destroys collective welfare.
The Prisoner’s Dilemma is the best-known example of this tension. Two suspects each choose independently to cooperate or defect. Defecting is a dominant strategy: it is better for you regardless of what the other person does. Yet if both defect, both get a worse outcome than if both had cooperated. Braess’s paradox is the same logic scaled to $N$ drivers.
The logit model
The simulation uses a probabilistic choice rule: the probability a driver picks route $r$ is proportional to $\exp(-\beta \cdot t_r)$, where $t_r$ is the expected travel time on route $r$ and $\beta$ is a sensitivity parameter. When $\beta$ is large, drivers strongly prefer the fastest route and the outcome approaches the pure Nash equilibrium. When $\beta$ is small, drivers choose nearly randomly and the paradox weakens. The parameter $\beta$ captures how responsive real drivers are to time differences.
This article was originally written for marimo.io.
$N$ commuters all want to leave for work at the same preferred time. The road has a fixed capacity: up to $C$ commuters per time slot travel quickly, but when more than $C$ try to leave in the same slot, everyone in that slot experiences extra delay proportional to the overload.
Each day, commuters observe yesterday’s travel times and shift their departure by one slot toward a less congested option with some probability. Much to their disappointment, the rush hour never disappears. Instead it:
flattens slightly (spreading across more slots), but
shifts its peak position over successive days, and
reaches a new quasi-equilibrium that may be no less congested than the original, just at a different time.
The intuition is that any slot that becomes less congested immediately attracts new commuters from adjacent overloaded slots, refilling it. Individual optimization is self-defeating in aggregate.
The simulation in this tutorial shows emergent dynamics:
The arrival distribution begins concentrated at the preferred slot.
Commuters shift away from congested slots, spreading the peak.
The spreading creates new local peaks at adjacent slots, which then attract their own shifters.
Over many days the distribution oscillates or drifts without converging to zero congestion.
The Vickrey Bottleneck Model
The classic model (Vickrey 1969) treats the road as a bottleneck with flow rate $s$ vehicles per unit time. At equilibrium, every commuter faces the same generalized cost:
where $d$ is queuing delay, $t^*$ is the desired arrival time, and $\beta, \gamma$ are schedule-delay costs for early and late arrival respectively. Vickrey showed that at Nash equilibrium a departure queue forms with length that rises and then falls as commuters spread across time to equalize cost, but total system delay is unchanged.
This model underlies modern road-pricing schemes: a time-varying toll that exactly offsets the schedule-delay cost eliminates queuing entirely while preserving the total commuting burden. In essence, the toll revenue replaces the wasted queuing time.
Understanding the Math
What is a congestion game?
Each commuter (the “player”) independently chooses a departure time slot. The delay experienced in any given slot depends on how many other commuters choose the same slot: if the slot is over capacity $C$, delay grows with the number of extra commuters. No central authority coordinates choices. This structure, where each player’s cost depends on the collective choices of all players, is called a congestion game.
Nash equilibrium in this context
A Nash equilibrium is a distribution of departure times such that no individual commuter can reduce their own delay by unilaterally switching to a different slot. At equilibrium, every occupied slot has the same congestion-adjusted cost. If slot 15 were cheaper than slot 14, commuters from slot 14 would shift to slot 15 until the costs equalized. The equilibrium is therefore defined by: all slots with commuters in them have equal cost, and all empty slots have cost no lower than the occupied ones.
Why Nash equilibrium is not the social optimum
The social optimum minimizes total delay summed over all commuters. The Nash equilibrium minimizes each person’s individual delay given everyone else’s choices. These are generally different objectives. At Nash equilibrium, a commuter choosing a crowded slot ignores the extra delay they impose on every other commuter already in that slot. They feel only their own delay; the cost they impose on others is a negative externality that they do not internalize.
Why the peak shifts but does not vanish
Suppose slot 15 is heavily congested. Some commuters shift to slot 14, relieving slot 15. But now slot 14 is more congested, so its commuters shift to slot 13. The congestion wave ripples outward in both directions. Meanwhile, commuters who shifted away from slot 15 now observe it as less congested and some drift back. The system never reaches zero congestion: it perpetually redistributes congestion across nearby slots in a slow drift. The Nash equilibrium exists in theory, but the day-by-day best-response dynamics cycle around it rather than converging to it, particularly when commuters respond noisily to yesterday’s conditions.
This article was originally written for marimo.io.
A single server processes jobs that arrive randomly according to a Poisson process. Most jobs are quick (exponential service with small mean), but a rare few are very slow (exponential service with large mean). This hyperexponential service distribution has high variance. This post compares the performance of two scheduling disciplines in this situation:
FIFO (First In, First Out): jobs are served in the order they arrive.
SJF (Shortest Job First): the server always picks the shortest queued job next.
The surprising result is that SJF dramatically outperforms FIFO: not just for the small jobs that directly benefit from skipping ahead, but also for mean sojourn time across all jobs. The improvement is most visible at the tail (95th and 99th percentiles) because FIFO creates a convoy effect: one long job blocks many short jobs behind it, inflating everyone’s wait.
The Convoy Metaphor
Picture a one-lane road with one slow truck and many fast cars. Every car behind the truck must drive at truck speed; no overtaking allowed. The truck is the long job; the cars are the short jobs stuck behind it in FIFO order. SJF is like a passing lane: fast cars jump ahead of the truck and reach their destination much sooner. The truck itself arrives at the same time either way, but the total delay experienced by all vehicles plummets.
Why FIFO Hurts with High Variance
In FIFO, the server’s current job is chosen at arrival time, not at decision time. When a slow job begins service, every subsequent arrival must join the queue and wait. The expected excess work in service (the remaining time of the current job, seen by an arriving customer) under FIFO is:
where $\overline{s^2}$ is the second moment of service time. High variance inflates $\overline{s^2}$ without changing $\rho$, directly worsening wait time.
SJF Minimises Mean Sojourn Time
For a single server with non-preemptive SJF and any service-time distribution, the mean sojourn time is given by the formula below (which is discussed in “Understanding the Math” at the end of this lesson):
SJF achieves this minimum because short jobs that would otherwise be blocked by a long job are promoted ahead, reducing the total waiting work in the system.
Practical Relevance
Operating system CPU schedulers use time-quanta and priority aging to approximate SJF without knowing job sizes in advance. Database query planners estimate query cost and reorder execution to minimize blocking. The phenomenon reappears as head-of-line blocking in HTTP/1.1 (one slow response stalls a connection), motivating HTTP/2 multiplexing and HTTP/3’s QUIC stream independence.
Understanding the Math
The second moment
For a random variable $S$ representing service time, the second moment is $E[S^2]$. Recall from your statistics course that variance is $\text{Var}(S) = E[S^2] - (E[S])^2$, which rearranges to:
$$E[S^2] = \text{Var}(S) + (E[S])^2$$
This means high variance inflates $E[S^2]$ even if the mean $E[S]$ stays fixed. Doubling the spread of service times can quadruple $E[S^2]$, even with the same average service time.
Why variance of service time hurts
Imagine a FIFO server handling jobs that are either 0.1 minutes or 10 minutes long, with 90% being short and 10% being long. The mean service time is $0.9 \times 0.1 + 0.1 \times 10 = 1.09$ minutes, so utilization $\rho = \lambda / \mu$ might be modest. But when a 10-minute job starts, every job arriving during those 10 minutes must join the queue and wait. The longer $E[S^2]$, the more average work sits ahead of each arriving job.
The Pollaczek–Khinchine formula
The mean time a job spends waiting (not counting its own service time) in a FIFO single-server queue is:
Here $\lambda$ is the arrival rate, $E[S^2]$ is the second moment of service time, and $\rho = \lambda \cdot E[S]$ is the server utilization. Both $\lambda$ and $E[S^2]$ appear in the numerator, so more variance means more waiting even at the same $\rho$. The $(1-\rho)$ denominator is the familiar blow-up term from M/M/1.
This article was originally written for marimo.io.
Two processing stages are arranged in series: Stage 1 feeds work into a bounded buffer, which feeds Stage 2. Both stages have the same mean service rate $\mu$, and the arrival rate $\lambda < \mu$ so neither stage is overloaded on average. However, Stage 1 has high variance (hyperexponential service); Stage 2 has zero variance (deterministic service).
Even though both stages have identical mean throughput and the system is underloaded, Stage 2 sits idle for a substantial fraction of time when the buffer between them is small. The idle fraction only vanishes as the buffer size $K \to \infty$.
High service-time variance at Stage 1 produces bursts of output—many jobs finish close together—followed by droughts. With a small buffer, the burst overflows (blocking Stage 1) and the drought starves Stage 2. Both effects reduce system throughput below what we would intuitively expect.
Analysis
For a two-stage tandem queue with a finite buffer of capacity $K$, the blocking probability at Stage 1 and the starvation probability at Stage 2 depend on the full service-time distributions, not just their means. The Kingman approximation gives the mean wait in a single G/G/1 queue as:
where $c_a^2$ and $c_s^2$ are the squared coefficients of variation of inter-arrival and service times respectively. For a hyperexponential service distribution with $c_s^2 \gg 1$, waiting times are far higher than the M/M/1 formula predicts.
In a tandem network, this extra variability propagates: the departure process of Stage 1 (which is the arrival process for Stage 2) has higher variance than Poisson when Stage 1 has high service variance. This is Departure Process Variability Propagation and is a key driver of manufacturing and supply-chain bullwhip effects.
Buffer as a Variability Absorber
The buffer acts as a shock absorber. Each unit of additional buffer capacity $K$ reduces the starvation probability at Stage 2 by absorbing burst output from Stage 1. The marginal benefit decreases as $K$ grows, leading to a classic diminishing-returns relationship. Practitioners use this to size work-in-progress inventory (WIP) buffers in manufacturing cells.
Understanding the Math
Coefficient of variation
The coefficient of variation (CV) of a random variable $X$ with mean $\mu$ and standard deviation $\sigma$ is defined as $c = \sigma / \mu$. It measures spread relative to the mean. A CV of 0 means the variable is deterministic: every value equals $\mu$. A CV of 1 means the spread equals the mean (the exponential distribution has CV exactly 1). A CV greater than 1 means the distribution is bursty: occasional very large values dominate, even if most values are small. The squared CV $c^2 = \sigma^2/\mu^2$ appears frequently in queueing formulas.
Why high CV at Stage 1 creates bursts and droughts
With a hyperexponential service distribution ($c_s^2 \gg 1$), Stage 1 sometimes completes several jobs in rapid succession (a burst) and sometimes spends a very long time on a single job (a drought). During a burst, jobs pile up in the buffer between stages. During a drought, Stage 2 exhausts the buffer and has to wait for Stage 1 to finish. This wastes capacity even though the system is underloaded on average.
The buffer as a shock absorber
A buffer of capacity $K$ can hold at most $K$ jobs between the two stages. It absorbs burst output from Stage 1 and releases it steadily to Stage 2. With a small buffer, a burst overflows (blocking Stage 1) or a drought empties the buffer (starving Stage 2). As $K$ grows, both effects weaken and Stage 2 idle time falls. However, the marginal benefit of each extra unit of buffer decreases. Real factories choose $K$ to balance the cost of holding inventory against the cost of machine starvation.
Kingman’s approximation
For a single-stage queue with general service and arrival distributions, the approximate mean waiting time is:
Here $c_a^2$ is the squared CV of inter-arrival times and $c_s^2$ is the squared CV of service times. Notice that the formula separates the utilization effect (the $\rho/(1-\rho)$ term) from the variability effect (the $(c_a^2 + c_s^2)/2$ term). When Stage 1 has $c_s^2 \gg 1$, wait time is far higher than the basic M/M/1 formula predicts, even at the same mean throughput. The mean alone does not tell you enough.
Supply-chain connection
In manufacturing, Stage 1 corresponds to a supplier and Stage 2 to a production line. High variability at the supplier forces the factory to hold large work-in-progress (WIP) buffers, tying up capital and floor space. The Toyota Production System explicitly targets CV reduction as the primary tool for shrinking necessary WIP by making every process more deterministic through standardized work and small batch sizes. The math here explains exactly why: lower $c_s^2$ directly reduces $W_q$ and the required buffer size $K$.
This article was originally written for marimo.io.
Why is the bus always late? Buses arrive at a stop with some average headway (gap between buses) of $\mu$ minutes. A passenger arrives at a uniformly random time and waits for the next bus. How long do they wait? The naive answer is $\mu / 2$: on average you land in the middle of a gap. The correct answer is almost always longer—sometimes much longer. The expected wait is not $\mu/2$ but:
where $\sigma^2 = \text{Var}[\text{headway}]$. The second term is always non-negative, so higher variance always means longer expected waits, even when the mean headway is unchanged.
Three Bus Schedules with Mean Headway $\mu = 10$
Schedule
$\sigma^2$
Predicted wait
Naive wait
Regular
0
5.0
5.0
Exponential
100
10.0
5.0
Clustered
64
8.2
5.0
For exponentially distributed headways, $\sigma^2 = \mu^2$, so:
A passenger waits on average for an entire mean headway — twice the naive expectation.
Why This Happens: Length-Biased Sampling
A passenger arriving at a random time is more likely to land inside a long gap than a short one, because long gaps occupy more time on the clock. This is called length-biased sampling. The interval containing your arrival is not a random headway: it is drawn from the length-biased distribution with density:
$$f^*(h) = \frac{h \cdot f(h)}{\mu}$$
The mean of this biased distribution is $\mu + \sigma^2/\mu$, and you arrive uniformly within it, giving expected wait $(\mu + \sigma^2/\mu)/2$.
The same phenomenon explains why the average class size experienced by a student exceeds the average class size reported by the university (large classes have more students to report them).
Why “Inspector’s Paradox”?
The name comes from quality control, where an inspector arrives at a random time to sample a production process and systematically encounters longer-than-average intervals. The paradox is that a random observer is more likely to land inside a long gap than a short one, so their experienced mean interval exceeds the true mean interval. It feels paradoxical because you’d expect a random arrival to see the average gap, but length-biased sampling guarantees they see worse-than-average gaps whenever there’s any variance at all.
Understanding the Math
Length-biased sampling
Suppose buses run on an irregular schedule where gaps between buses are either 2 minutes or 18 minutes, each with probability 1/2. The mean gap is $\mu = (2 + 18)/2 = 10$ minutes. Now ask: if you arrive at a completely random moment, which gap are you most likely to land inside?
A 2-minute gap occupies only 2 minutes on the clock, but an 18-minute gap occupies 18. Out of every 20 minutes of clock time on average, 2 minutes belong to a short gap and 18 to a long one. So a random arrival lands in a short gap with probability $2/(2+18) = 1/10$ and in a long gap with probability $18/20 = 9/10$. The expected gap length you experience is:
That is far above the mean gap of 10 minutes. You are disproportionately likely to land inside a long gap simply because it takes up more time.
The wait formula
Once you are inside a gap, you arrive uniformly within it, so on average you land in the middle. Your expected wait is half the gap length you experience. The full formula is:
Here $\mu$ is the mean gap and $\sigma^2 = \text{Var}[\text{gap}]$ is the variance of gap lengths. The first term, $\mu/2$, is what you would get if every gap were exactly $\mu$ (deterministic buses — arrive in the middle every time). The second term, $\sigma^2/(2\mu)$, is the extra waiting from length-biased sampling. It is always non-negative, so irregular buses always make you wait longer than regular buses with the same mean headway.
Why variance matters
The variance $\sigma^2$ measures how spread out the gap sizes are. A perfectly regular bus schedule has $\sigma^2 = 0$ and gives the naive answer $\mu/2$. An exponentially distributed schedule has $\sigma^2 = \mu^2$, which doubles the expected wait to $\mu$. More irregular buses, higher penalty.
Connecting to expected values
The formula arises from a standard result: the expected length of the gap containing a random arrival is $\mu + \sigma^2/\mu$. You can think of this as the mean gap plus a correction term proportional to the variance divided by the mean. Dividing by 2 (uniform arrival within the gap) gives the wait formula above.
This article was originally written for marimo.io.