Blog

2020 2021 2022 2023 2024 2025
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
2004 2005 2006 2007 2008 2009

Refactoring the Simulation

After yesterday’s experiments with rework I was going to spend a post or two building some charts, but apparently people find a task-centric simulation less natural than a worker-centric one, so instead I’m going to do some rearchitecting.

Parameters

Let’s start by going back to a pool of developers doing tasks one at a time without rework. As before, we’ll define some default parameters, including a random number seed to ensure reproducibility:

PARAMS = {
    "max_task_duration": 10.0,
    "num_developers": 2,
    "random_seed": 12345,
    "simulation_duration": 10,
    "task_arrival_rate": 2.0,
}

To allow ad hoc experiments, we can add a helper function update_params to parse a command line like:

python sim.py simulation_duration=50 random_seed=67890

and override the default parameters. (You can see the function in the source code.)

Bundling Simulation Elements

As our simulation becomes more complex, we’re going to have more assets to manage: pools of developers, a work queue, and eventually some testers and maybe even a manager or two. Rather than passing a dozen objects around, let’s define a Simulation class to store them all:

class Simulation:
    """Store simulation artifacts."""

    def __init__(self, params):
        self.params = params
        self.env = simpy.Environment()
        more stuff will go here

    def process(self, proc):
        self.env.process(proc)

    def run(self):
        self.env.run(until=self.params["simulation_duration"])

    def timeout(self, duration):
        return self.env.timeout(duration)

    @property
    def now(self):
        return self.env.now

    def task_arrival(self):
        return random.expovariate(1.0 / self.params["task_arrival_rate"])

    def task_duration(self):
        return random.uniform(1, self.params["max_task_duration"])

To save ourselves from repeatedly needing to type sim.env, this class provides pass-through methods for the simulation environment’s key methods and properties. And since it is storing the simulation parameters, it seems like the logical place to put the code that uses those parameters to generate task arrival times and durations.

Tasks

This version of the simulator has active workers and passive tasks, i.e., the workers are going to be processes that act on tasks. A task is therefore just a sack full of properties like when it arrived, how long it takes to complete, when work on it started, and when work was completed:

class Task(Labeled):
    """A single (passive) task."""

    more stuff will go here

    def __init__(self, sim):
        """Construct."""

        super().__init__()
        self.arrived = sim.now
        self.duration = sim.task_duration()
        self.started = None
        self.completed = None

        more stuff will go here

Note that Task is derived from a generic class called Labeled. This class automatically gives each instance of Task a unique integer ID and keeps a list of all the tasks created so far; if you’re interested, you can see the magic in the source code.

Where Do Tasks Go?

Before we move on to modeling developers, let’s define a generator to create new tasks and give the simulation a place to store them. For the latter, we will use a SimPy Store, which provides a queue with put and get methods:

class Simulation:

    def __init__(self, params):
        previous code
        self.queue = simpy.Store(self.env)

We then define a generator to add new tasks to this queue at random intervals. This generator could be a free-standing function, but we’re likely to want to add other generators to the simulation in the future, so let’s make it a static method of the Task class instead:

class Task(Labeled):

    @staticmethod
    def generate(sim):
        while True:
            yield sim.timeout(sim.task_arrival())
            task = Task(sim)
            yield sim.queue.put(task)

As we explained in the first post in this series, generate’s use of yield means that it isn’t a regular function. Instead, it creates a generator that SimPy can suspend and restart as often as it wants. Each time around the while loop, generate calls sim.task_arrival() to generate a random delay, then calls sim.timeout() to create an object that means, “Please suspend this generator for this length of time.” The first yield inside the loop passes that object to SimPy, which parks the generator until that much simulated time has passed. When the generator resumes, it creates a new task object and adds it to the queue of pending work.

My first implementation of this function had a bug: it called sim.queue.put(task) but didn’t yield the result. It took me a couple of minutes to remember that a SimPy Store can have a limited capacity, and that an attempt to add a new item to a Store will block until there’s room for it. sim.queue.put(task) therefore doesn’t actually add the object to the store; instead, it creates a temporary object that means, “Please add this to the store when there’s space.” The generator has to give that temporary object to SimPy with yield so that the framework can suspend the generator if necessary. Our work queue has infinite capacity, so the generator will never block waiting for space, but we still need to use yield.

Modeling Developers

With all that in place, modeling each developer as an active process is fairly straightforward:

class Developer(Labeled):
    """A single (active) developer."""

    def __init__(self, sim):
        """Construct."""

        super().__init__()
        self.sim = sim

    def work(self):
        """Simulate work."""

        while True:
            task = yield self.sim.queue.get()
            task.started = self.sim.now
            yield self.sim.timeout(task.duration)
            task.completed = self.sim.now

As with Task, we derive Developer from Labeled so that each developer will automatically have a unique integer ID. Developer.work repeatedly gets a task from the work queue (blocking if necessary until one is available), records the task’s start time, suspends itself for some simulated length of time, and then records the task’s completion time.

Logging

Once the simulation is done, we’re going to want to look at how many tasks were completed and how long tasks spent waiting for a developer:

def write_log(stream):
    """Write task details as CSV."""

    log = [("kind", "id", "duration", "arrived", "started", "completed")]
    log.extend(task.log() for task in Labeled._all["Task"])
    csv.writer(stream, lineterminator="\n").writerows(log)

Labeled._all["Task"] is a list of all the task objects created so far (again, see the source code if you want details), while task.log() is just:

class Task(Labeled):

    def log(self):
        """Convert to loggable entry."""

        return (
            "task",
            self.id,
            log_fmt(self.duration),
            log_fmt(self.arrived),
            log_fmt(self.started),
            log_fmt(self.completed)
        )

with a bit of helper code:

PRECISION = 2

def log_fmt(val):
    return None if val is None else round(val, PRECISION)

The Output

If we run the simulation with the default parameter values, we get this output:

kind,id,duration,arrived,started,completed
task,0,1.09,1.08,1.08,2.17
task,1,3.69,4.57,4.57,8.25
task,2,2.74,5.49,5.49,8.23
task,3,2.46,7.15,8.23,
task,4,4.9,7.42,8.25,
task,5,2.57,9.07,,

If we reformat this as a table:

kind id duration arrived started completed
task 0 1.09 1.08 1.08 2.17
task 1 3.69 4.57 4.57 8.25
task 2 2.74 5.49 5.49 8.23
task 3 2.46 7.15 8.23
task 4 4.9 7.42 8.25
task 5 2.57 9.07

we can see that work started on the first three tasks as soon as they arrived, that work started on the next two after a delay but didn’t finish by the time the simulation ended, and that the last task wasn’t even started.

Some Introspection

Is this worker-centric design better that the previous task-centric design? It was certainly easier to write, but that’s probably because I learned more about SimPy and thought through design questions while working on the earlier versions. The real test is whether you, the reader, find it easier or harder to follow. If you have an opinion, please reach out.

Simulating Rework

In yesterday’s post we started collecting data from our simulation, and discovered that our metrics will always give us numbers, but those numbers might be garbage if we’re measuring the wrong things or the wrong way. In this post we’ll make the simulation more realistic by taking into account the fact that most first attempts to fix a bug either don’t fix it or introduce new bugs.

Rework

Let’s introduce a new parameter to specify the probability that a task needs to be re-done. For the moment, we’ll assume this probability doesn’t depend on how many attempts have been made to finish the task so far. While we’re at it, we’ll introduce another parameter to specify how many times we want to run the simulation:

PARAMS = {
    previous parameters
    "rework_probability": 0.6,
    "num_simulations": 1,
}

Next, we introduce a loop in simulate_task. Each time around the loop, we get a developer, spend some time on the task, and then check if we’re done. If we are, we mark the task as completed and break out of the loop; if not, we go around the loop again.

def simulate_task(params, env, developers, task):
    """Simulate a task flowing through the system."""

    while True:
        with developers.request() as req:
            yield req
            task.start(env)
            yield env.timeout(task._duration)
            task.end(env)
            if random.uniform(0, 1) >= params["rework_probability"]:
                task.completed()
                break

If we run our simulation a few times for 100,000 timesteps with two developers, we complete an average of 14,550 tasks, which works out to about 13.7 timesteps per task. That’s about 40% of the 36,000 tasks we complete in the same time without rework, and gives us a baseline against which to compare a more interesting variation on this model.

Keeping the Same Developer

When a task needs to be redone, the code shown above always grabs the next available developer. In most development teams, though, the same developer would keep working on the task.

We need to make two changes to the simulation to capture this. First, we create a FilterStore instead of a Resource to represent our pool of developers:

    developers = simpy.FilterStore(env, capacity=params["num_developers"])

The difference between the two is that FilterStore allows us to specify a test or check that a resource has to satisfy when we claim it. Here’s the modified simulate_task that makes use of that:

def simulate_task(params, env, developers, task):
    """Simulate a task being re-worked by the same developer."""

    dev = None
    while True:
        if dev is None:
            dev = yield developers.get()
        else:
            dev = yield developers.get(lambda item: item._id == dev._id)
        task.start(env)
        yield env.timeout(task._duration)
        developers.put(dev)
        task.end(env)
        if random.uniform(0, 1) >= params["rework_probability"]:
            task.completed()
            break

Initially, the task isn’t associated with a developer so dev is set to None. The first time around the loop, we use developers.get() without a test to claim a developer. After that, we always re-claim the same developer by passing a function to developers.get() that says, “The only developer we want is the one with the same ID.” We then do some work, put the developer back in the pool, and check to see if we’re done with this task or not.

Does It Make a Difference?

So here’s our question: does this policy change the total number of tasks completed or not? On the one hand we could argue that it would because we wouldn’t be taking full advantage of idle developers. On the other hand, our developers are currently working almost 100% of the time, so there aren’t actually idle cycles going to waste.

The answer is that under the assumptions baked into our simulation this change in policy doesn’t have an impact on the total number of tasks that are completed. If we increase the number of developers and lower the rate at which new tasks arrive, though, that answer changes: if developers do have idle cycles, having developers “own” tasks is less efficient than allowing whoever’s free to work on whatever needs done.

Of course, that model doesn’t take into account the ramp-up time that real developers need to familiarize themselves with new problems. (Putting it another way, it doesn’t account for the time saved by having someone who understands the problem go back to it.) We could add another parameter to our model and another few lines of code to capture this, but before we do that, we need to put some better analysis machinery in place.

Looking for Work

I was laid off by Plotly a few weeks ago, so I’m looking for something new. I would prefer to work in education, developer experience, open source, or scientific computing, and I’m open to part-time, full-time, or contracting positions, either in Toronto or remote. There’s more about me at https://third-bit.com/about/; if you’d like to chat, please give me a shout.


Later: a couple of people have asked me what I’d like most to do. My answer has evolved a bit since I wrote this post:

  1. Build and deliver a workshop on organizational change for people working in open source and open science. I’m really tired of watching everyone roll the same rocks up the same hills year after year; if we don’t change the system, that’s never going to get better.

  2. I want to say “teach scientists how to build their own software and manage their data”, because that’s the most rewarding thing I’ve ever done professionally, but what I really want is to do a multi-year study of the impact that this training has. Back when I was a professor, I asked NSERC to fund some Carpentries-style workshops for undergraduates who are spending a summer in a research lab and then ask (a) whether it helped them immediately and (b) whether they were then more likely to go on to graduate school. NSERC said no, but maybe someone else would see the value.

  3. I miss coding—I realized a while back that nothing I’ve written is still in production use anywhere in the world. I’d be proud and grateful for a chance to help build something related to climate crisis resilience, defending human rights, or making open source development a sustainable career choice.

There are lots of other things I’d enjoy working on, including a tower defense game in which the player is trying to help the travellers on the road instead of kill them, a mod for Stardew Valley that turns it into a cozy murder mystery game, a new kind undergraduate software engineering course, or any of these books.

What I want most of all, though, is to do whatever I do next with a group of people I enjoy hanging out with. I still miss some of my colleagues from the Carpentries, RStudio, and Deep Genomics, and I’m sure I’m going to miss a lot of the folks I worked with at Plotly; if I’ve learned anything in sixty-two years, it’s that in the long run, the “who” matters at least as much as the “what”.

Time for another cup of tea. If you came in peace, be welcome.

Making Sense of Simulation

I ended yesterday’s post by saying that we need to collect some data and do a bit of analysis. SimPy doesn’t do monitoring out of the box, so we have to roll our own.

Tracking Time in Tasks

The first step is to update the class that represents a task so that it can keep track of how much time it takes. To do this, we add two fields called ._current (the most recent start-of-work time) and ._elapsed (the total time so far). We then add three methods: .start to signal the start of work, .end to signal the end of work, and .is_complete so that we can ask if the task has been completed:

class TaskRecorder:
    """Task with uniformly-distributed durations that records activity."""

    _id = count()

    def __init__(self, params):
        self._id = next(TaskRecorder._id)
        self._duration = TaskRecorder.duration(params)
        self._elapsed = 0.0
        self._current = None

    def start(self, env):
        assert self._current is None
        self._current = env.now

    def end(self, env):
        assert self._current is not None
        self._elapsed += env.now - self._current
        self._current = None

    def is_complete(self):
        return (self._current is None) and (self._elapsed > 0.0)

The logic of .is_complete needs a bit of explanation. When the task is initially created, its current start time is None. When it is being executed, ._current records the (simulated) time at which work started, and when it’s done, we re-set ._current to None and record the elapsed time in ._elapsed. The task is therefore complete if ._current is None and some elapsed time has been set.

With this in place, we can modify the generator that simulates task execution to signal the start and stop of work:

def simulate_task(env, developers, task):
    """Simulate a task flowing through the system."""

    task.start(env)
    with developers.request() as req:
        yield req
        yield env.timeout(task._duration)
        task.end(env)

(We pass the simulation environment env to task.start and task.end so that the task can check the current simulated time with sim.now.)

Gathering Statistics

Now that each task is recording its elapsed time, we need to gather statistics for all of them. Let’s go back to the TaskRecord class and have each new task add itself to a list owned by the class:

class TaskRecorder:
    """Task with uniformly-distributed durations that records activity."""

    previous code
    _all = []

    def __init__(self, params):
        previous code
        TaskRecorder._all.append(self)

When the simulation is finished, we can walk through this list to find out how much time was spent on each task and whether that task was completed or not:

PRECISION = 2


def write_log(params, stream):
    """Create and write entire log."""

    log = [("kind", "id", "elapsed", "completed")]
    log.extend([
        ("task", x._id, round(x._elapsed, PRECISION), x.is_complete())
        for x in TaskRecorder._all
    ])
    total = sum(x._elapsed for x in TaskRecorder._all)
    log.append(("total", "task", round(total, PRECISION), None))
    csv.writer(stream, lineterminator="\n").writerows(log)


def main(params):
    previous code
    write_log(params, sys.stdout)

Let’s run it for 100 simulated timesteps with two developers:

kind,id,elapsed,completed
task,0,1.09,True
task,1,3.69,True
task,2,2.74,True
…,…,…,…
task,39,0.0,False
task,40,0.0,False
task,41,0.0,False
total,task,378.68,

Cool! Our simulation is working—except it isn’t.

Don’t Trust, Just Verify

Take a look at the last line of the output shown above. It’s telling us that our workers did a total of 378.68 timesteps of work during this simulation, but that’s impossible: we have two developers and the simulation ran for 100 timesteps, so even if both developers were busy 100% of the time, they couldn’t have done more than 200 timesteps of work. What’s gone wrong?

The answer lies in simulate_task, which we reproduce here from above:

def simulate_task(env, developers, task):
    task.start(env)
    with developers.request() as req:
        yield req
        yield env.timeout(task._duration)
        task.end(env)

The problem is that we’re including the time the task spends waiting for a developer in our measure of how long the task was worked on. The fix is to move task.start after yield req, so that we start the task’s clock when we start work on it:

def simulate_task(env, developers, task):
    with developers.request() as req:
        yield req
        task.start(env)
        yield env.timeout(task._duration)
        task.end(env)

When we re-run the simulation with the same parameters, we get:

kind,id,elapsed,completed
task,0,1.09,True
task,1,3.69,True
task,2,2.74,True
…,…,…,…
task,39,0.0,False
task,40,0.0,False
task,41,0.0,False
total,task,149.41,

This result is physically plausible but still puzzling: why are the developers only busy 75% of the time? Let’s try running the simulation for 100,000 timesteps instead:

…,…,…,…
total,task,199961.93,

That works out to 99.98% utilization, so the earlier figure of 75% seems to be due to warmup effects.

What We’ve Learned

What I learned from this bug is that badly-designed measurements can produce numbers that are wrong without being obviously wrong. I didn’t immediately notice that worker utilization in the buggy simulation was impossible; I don’t believe I would have noticed at all if the measure or the simulation were more complicated. As I build more realistic simulations, I’m going to add checks on the results based on the properties of the model (e.g., “we can’t have done more work than is physically possible”). Similarly, when we start to monitor real developers, I’m going to think very carefully about whether what I’m measuring is measuring what I want to.

I also have to go back and figure out exactly what “warmup effects” means: right now, I’m just waving my hands and hoping you won’t notice that I glossed over that.

Simulating a Developer Pool

Following yesterday’s post about using SimPy for discrete event simulation, let’s have a look at a slightly more interesting scenario: a pool of developers, all of whom work at the same speed, handling tasks of varying duration that arrive at random intervals.

Overall Structure

The main function of the simulation takes a dictionary of parameter values, creates a SimPy environment, does a little bit of magic, and then simulates the system for a specified length of time:

def main(params):
    """Run simulation."""

    random.seed(params["random_seed"])
    env = simpy.Environment()
    developers = simpy.Resource(env, capacity=params["num_developers"])
    env.process(generate_tasks(params, env, developers))
    env.run(until=params["simulation_duration"])

The “magic” referred to earlier has two parts:

  1. Since the developers are all the same (for now), we can model them using a SimPy Resource with a fixed capacity. This basically acts as a pile of things: our processes can take a thing if one is available, but will be suspended if one is not.
  2. We then call generate_tasks to create the generator that adds new tasks to the system and give that generator to env.process to run.

Generating Tasks

OK, what does generate_tasks do?

def generate_tasks(params, env, developers):
    """Generates tasks at random intervals."""

    while True:
        yield env.timeout(random.expovariate(1.0 / params["task_arrival_rate"]))
    task = TaskUniform(params)
        env.process(simulate_task(env, developers, task))

Each time around the loop, it calls env.timeout to create an object representing a delay and uses yield to hand that to SimPy, which suspends the process until that much simulated time has passed. When it resumes, the generator creates a new TaskUniform object (we’ll look at that in a moment) and then calls simulate_task to create a new process (i.e., a new generator) to simulate the execution of that task. (The while True in this function is a bit misleading: this generator hands control back to SimPy each time it encounters the yield statement, and only resumes if SimPy reschedules it, so it only runs as long as SimPy wants it to.)

When I first started working with SimPy, I found it a bit counter-intuitive to represent tasks as processes rather than using processes for the workers. It’s possible to do the latter, but the code turns out to be simpler if we treat developers as a passive resource. Read into that what you will…

What Kind of Randomness?

At this point we need to talk about math, because random.expovariate is anything but intuitive. Suppose that the probability of a task arriving at any moment is fixed and completely independent of when the previous task arrived. This is called a memoryless system, and it turns out that the time between arrivals in such a system has an exponential distribution: if the average arrival rate is λ events per unit time, then the time between arrivals is an exponential random variable with mean 1/λ. Real-world arrival rates are rarely this tidy, but it’s a good starting point.

Representing Tasks

We now have two bits of code to explore: the TaskUniform class that represents a task and the simulate_task function that models its behavior. The former just stores a unique ID and a duration, and knows how to represent itself as a string for printing:

class TaskUniform:
    """Task with uniformly-distributed durations."""

    _id = count()

    def __init__(self, params):
        self._id = next(TaskUniform._id)
        self._duration = random.uniform(1, params["max_task_duration"])

    def __str__(self):
        return f"task-{self._id}/{self._duration:.2f}"

Note that we’ve decided to model task durations as uniformly distributed between 1.0 and some maximum value. This might win a prize for “least realistic assumption”, but we’ll come back and adjust it later.

Simulating a Task

Finally, we get to actually simulating a task. As before, the fact that this function uses yield means that it creates a generator; I think that would have been clearer if a keyword like gen had been introduced in place of def, but it is what it is:

def simulate_task(env, developers, task):
    """Simulate a task flowing through the system."""

    available = developers.capacity - developers.count
    print(f"{env.now:.2f}: {task} arrives, {available} available")
    request_start = env.now
    with developers.request() as req:
        yield req
        delay = env.now - request_start
        print(f"{env.now:.2f}: {task} starts after delay {delay:.2f}")
        yield env.timeout(task._duration)
        print(f"{env.now:.2f}: {task} finishes")

Going through this phrase by phrase:

  1. The task starts by reporting how many developers are available. (Remember, developers is a Resource with a fixed capacity; its .count tells us how much of that resource is currently being used.)
  2. The task then records the time at which it starts running. We don’t have to do this right at the start of the function (before checking the number of available developers) because we’re not checking the computer’s actual clock: env.now gives us the current simulated time in the SimPy environment, and that time only advances when processes tell the framework that they want it to.
  3. We then ask for a developer using developer.request(). This method gives us an object that we store in req, which we immediately yield to tell SimPy what we want. If the request can be satisfied right away, SimPy immediately reschedules this process; if the resource is already at capacity, SimPy suspends this process until a developer is available.
  4. …which means that as soon as execution passes the yield req line, we know that a resource is available. We print a message to report how long the task had to wait for a developer…
  5. …and then yield the object created by calling env.timeout with the task’s duration to tell SimPy that the task needs that much time to complete.
  6. Finally, we print another message to report when the task finished…
  7. …and give the developer back to the developer pool. This isn’t visible in the code: it’s done automatically when we reach the end of the with block, in the same way that a file opened in a with is automatically closed at the block’s end.

A Sample Run

Let’s run the simulation with the following parameters:

PARAMS = {
    "max_task_duration": 10.0,
    "task_arrival_rate": 2.0,
    "num_developers": 2,
    "simulation_duration": 20,
    "random_seed": 12345,
}

The output is:

1.08: task-0/1.09 arrives, 2 developers available
1.08: task-0/1.09 starts after delay 0.00
2.17: task-0/1.09 finishes
4.57: task-1/3.69 arrives, 2 developers available
4.57: task-1/3.69 starts after delay 0.00
5.49: task-2/2.74 arrives, 1 developers available
5.49: task-2/2.74 starts after delay 0.00
7.15: task-3/2.46 arrives, 0 developers available
7.42: task-4/4.90 arrives, 0 developers available
8.23: task-2/2.74 finishes
8.23: task-3/2.46 starts after delay 1.07
8.25: task-1/3.69 finishes
8.25: task-4/4.90 starts after delay 0.83
9.07: task-5/2.57 arrives, 0 developers available
10.68: task-6/4.19 arrives, 0 developers available
10.68: task-3/2.46 finishes
10.68: task-5/2.57 starts after delay 1.61
13.15: task-4/4.90 finishes
13.15: task-6/4.19 starts after delay 2.47
13.25: task-5/2.57 finishes
17.03: task-7/1.82 arrives, 1 developers available
17.03: task-7/1.82 starts after delay 0.00
17.34: task-6/4.19 finishes
18.85: task-7/1.82 finishes

Things are certainly happening: tasks are arriving and either starting immediately or waiting until a developer is available. But is our code correct? And what does it tell us about how long tasks take to complete and how busy developers are? To answer those questions, we need to collect some data and do a bit of analysis. Stay tuned…

Starting to Simulate

Inspired in part by the software development simulator that Elisabeth Hendrickson and friends built, I’ve been toying with the idea of using SimPy to create something that could be used in a half-day workshop to (a) show people how powerful discrete event simulation can be, and (b) demonstrate why simple measurements of developers’ work are almost always misleading. It’s been thirty-five years since I did this kind of simulation, SimPy uses some features of Python that I haven’t kept up with, and I’ve never taught this topic before, so I’m planning to do a few blog posts about how it works and what I’m learning.

Generators

A generator is a Python function that creates an object which uses its stack to maintain persistent state. That probably doesn’t make any sense unless you already understand it, so here’s an example:

def gen_char_from_string(text):
    i = 0
    while i < len(text):
        yield text[i]
        i += 1

The magic is the keyword yield, which tells Python that this function creates a generator. (I think the code would be easier to understand if a keyword like gen had to be used instead of def, but that ship sailed a long time ago.) We can use our generator like this:

gen = gen_char_from_string("one")
try:
    i = 0
    while True:
        ch = next(gen)
        print(f"{i}: {ch}")
        i += 1
except StopIteration:
    print("ended by exception")

As this code shows, calling gen_char_from_string creates an object rather than immediately returning a value. Each time we call next on that object, it advances to the next yield. The generator object preserves its stack, so the local variables text and i persist from one invocation to the next. When execution reaches the end of the generator, Python thows a StopIteration exception to tell us it’s done.

Why go to all this trouble? First, for loops understand how to interact with generators, so we can rewrite the snippets above as:

def gen_char_from_string(text):
    for ch in text:
        yield ch

characters = [ch for ch in gen_char_from_string("two")]
print(f"result as list: {characters}")

Second, we can do things like create “infinite” generators:

def gen_infinite(text):
    pos = 0
    while True:
        yield text[pos]
        pos = (pos + 1) % len(text)


for (i, ch) in enumerate(gen_infinite("three")):
    if i > 9:
        break
    print(i, ch)
0 t
1 h
2 r
3 e
4 e
5 t
6 h
7 r
8 e
9 e

or generate the cross-product of two arbitrary collections:

def gen_combinations(left, right):
    for left_item in left:
        for right_item in right:
            yield (left_item, right_item)


for pair in gen_combinations("abc", [1, 2, 3]):
    print(pair)
('a', 1)
('a', 2)
('a', 3)
('b', 1)
('b', 2)
('b', 3)
('c', 1)
('c', 2)
('c', 3)

We can even store our generators in a list (after all, they’re just objects) and then use them however we want:

def alternate(processes):
    while True:
        try:
            for proc in processes:
                yield next(proc)
        except StopIteration:
            break

def seq(values):
    for v in values:
        yield v

sequences = [seq("ab"), seq("123")]
for thing in alternate(sequences):
    print(thing)
a
1
b
2

Discrete Event Simulation

Discrete event simulation models a system as a series of events, each of which occurs at a particular instant in time. We can simulate such a system by advancing the clock one tick at a time, but it’s more efficient to have each process tell us when it’s next going to do something and then jump ahead to the least of those times.

Note: we’re using the word “process” to mean “something active” rather than “an operating system process”. I prefer the word “actor”, but “process” is pretty deeply embedded in discrete event simulation terminology.

SimPy handles all these details for us. All we have to do is:

  1. Create an environment that stores all the SimPy-ish stuff like the list of running processes.
  2. Give it some processes to run.
  3. Tell it to run the simulation until all the processes are done.
import sys
import simpy

def main():
    env = simpy.Environment()
    env.process(worker(env, worker_id=1, num_jobs=3, job_time=2))
    env.process(worker(env, worker_id=2, num_jobs=3, job_time=3))
    env.run()
    print(f"Simulation completed at T={env.now}")

The key thing here is that worker isn’t a normal function call. Instead of handing a value to env.process, it creates a generator for the environment to run. That generator yields env.timeout(delay) to tell SimPy that it wants to wait for a certain length of time, e.g., to simulate being busy. (Processes can yield other things as well—we’ll see some in future posts.)

def worker(env, worker_id, num_jobs, job_time):
    """Actor with fixed-time jobs."""
    print(f"T={env.now} worker {worker_id} starts")

    for job_num in range(num_jobs):
        print(f"T={env.now} worker {worker_id} starts job {job_num} duration {job_time}")
        yield env.timeout(job_time)
        print(f"T={env.now} worker {worker_id} finishes job {job_num}")

    print(f"T={env.now} worker {worker_id} finishes")

Here’s the output:

T=0 worker 1 starts
T=0 worker 1 starts job 0 duration 2
T=0 worker 2 starts
T=0 worker 2 starts job 0 duration 3
T=2 worker 1 finishes job 0
T=2 worker 1 starts job 1 duration 2
T=3 worker 2 finishes job 0
T=3 worker 2 starts job 1 duration 3
T=4 worker 1 finishes job 1
T=4 worker 1 starts job 2 duration 2
T=6 worker 2 finishes job 1
T=6 worker 2 starts job 2 duration 3
T=6 worker 1 finishes job 2
T=6 worker 1 finishes
T=9 worker 2 finishes job 2
T=9 worker 2 finishes
Simulation completed at T=9

Next Steps

A couple of workers doing fixed-size jobs, completely independently, isn’t a particularly interesting scenario. The next post in this series will look at two things:

  1. How do we make a more realistic simulation?
  2. How do we create more comprehensible output?

These questions are equally important because there’s no point building something whose results we can’t understand (and check). Stay tuned…

End of Life Ideas

Years ago, Mike Hoye suggested that we should retire Unix usernames in the way that sports teams retire jersey numbers, and for the same reasons. I was reminded of this while doing interviews with people about how projects end; a couple of other ideas that have come up are adding a -eol (end-of-life) modified to semantic versioning similar to the -beta modifier, so that 1.2.3-eol would mean “this is the last release this project is going to do”, and holding wakes for projects so that people can celebrate the contributions they did make and the impact they did have. If you have other suggestions or rituals, please get in touch.

Cognitive Pollution

Update: I have posted a recording of the talk.

I am grateful to Prof. Alberto Bacchelli for inviting me to give a colloquium at the University of Zurich a couple of days ago. My talk was titled Cocaine and Conway’s Law, but what it was really about was how to teach young software developers about cognitive pollution.

When most existing software engineering courses talk about risk, they talk about single-point incidents like the Therac-25 or Ariane 5 flight V88 where there is a direct cause-and-effect relationship between a mistake in software and something bad happening. I think we should instead talk about things like tetraethyl lead and asbestos, or about Purdue Pharma’s role in the opioid epidemic and the fossil fuels lobby’s campaign to promote climate change denial.

In each of these cases, deliberate choices increased the general level of risk to hundreds of millions of people, but the statistical nature of that risk allowed those responsible to avoid personal accountability. I think that case studies like these will help learners understand things like the role that Meta’s algorithms played in the Rohingya genocide and think more clearly about scenarios like the one below:

It is 2035. Most men under 20 have learned what they “know” about what women like from AI chatbots trained on porn and tuned to maximize engagement. As a result, many of them believe that a frightened “no” actually means “yes please”. The people who have earned billions from these systems cannot legally be held accountable for their users’ actions.

How does that make you feel?

If you are currently teaching an undergraduate course that covers cases like these, please get in touch: I’d be grateful for a chance to learn from you.