Performance, Concurrency, and System Interaction

Repeated Scans

Run the word-frequency function on the provided text file and time how long it takes. Then run cProfile on it to see which lines consume the most time.

import sys
import time


def most_common_slow(text):
    """Return the most common word using a repeated full-text scan per word."""
    words = text.split()
    unique = set(words)
    best_word, best_count = None, 0
    for word in unique:
        count = text.count(word)
        if count > best_count:
            best_count, best_word = count, word
    return best_word, best_count


def generate_text(num_words=50_000, vocab_size=500):
    """Generate a reproducible text with a known vocabulary for timing tests."""
    import random
    random.seed(42)
    vocab = [f"word{i}" for i in range(vocab_size)]
    return " ".join(random.choice(vocab) for _ in range(num_words))


if __name__ == "__main__":
    if len(sys.argv) > 1:
        with open(sys.argv[1]) as f:
            text = f.read()
        print(f"Loaded {len(text.split())} words from {sys.argv[1]}")
    else:
        print("No file given; generating 50,000-word test text…")
        text = generate_text()

    start = time.perf_counter()
    word, count = most_common_slow(text)
    elapsed = time.perf_counter() - start

    print(f"Most common: {word!r} ({count} times)")
    print(f"Slow method: {elapsed:.3f}s")
    print()
    print("Fix: replace the loop with collections.Counter(text.split()).most_common(1)")

the quick brown fox jumps over the lazy dog the fox was very quick
and the dog was very lazy the brown fox leaped high over the sleeping dog
a quick brown fox is quicker than a lazy brown dog or so they say
the dog slept under the tree while the fox ran quickly through the field
over and over the fox jumped and the dog slept on and on

Show explanation

The bug is calling text.count(word) for every unique word, re-scanning the entire text each time. On a file of 50,000 words it takes several seconds, while a single pass with collections.Counter is nearly instant. Teaches how to identify repeated-scan inefficiency with cProfile and how choosing the right data structure eliminates the need for multiple passes.

Subprocess Waiting for Input

Run this script. Does it return promptly, or does it hang?

import subprocess
import sys


def count_words(text):
    """Count words in text by passing it to a child Python process."""
    proc = subprocess.Popen(
        [sys.executable, "-c", "import sys; print(len(sys.stdin.read().split()))"],
        stdout=subprocess.PIPE,
    )
    try:
        stdout, _ = proc.communicate(timeout=3)
        return int(stdout.strip())
    except subprocess.TimeoutExpired:
        proc.kill()
        proc.wait()
        return None


if __name__ == "__main__":
    sample = "the quick brown fox jumps over the lazy dog"
    print(f"Text: {sample!r}")
    print(f"Expected word count: {len(sample.split())}")
    print("Calling count_words()… (will time out in 3 s if stdin is not piped)")
    result = count_words(sample)
    if result is None:
        print("Timed out — child was waiting for stdin that was never provided")
    else:
        print(f"Word count: {result}")

Show explanation

The bug is that the subprocess is waiting for input on stdin that the parent never provides, so the script hangs and never returns. Teaches how subprocess I/O streams work and how to use communicate() safely.

Race Condition in Shared Counter

Run this script several times and record the final counter value each time. Is the value always the same? Is it always the value you expect?

import threading

THREADS = 5
INCREMENTS = 100_000

counter = 0


def increment():
    global counter
    for _ in range(INCREMENTS):
        counter = counter + 1


if __name__ == "__main__":
    threads = [threading.Thread(target=increment) for _ in range(THREADS)]
    for t in threads:
        t.start()
    for t in threads:
        t.join()

    expected = THREADS * INCREMENTS
    print(f"Expected: {expected}")
    print(f"Got:      {counter}")
    if counter != expected:
        print(f"Lost {expected - counter} increments to the race condition")
    else:
        print("(got lucky this run — try again or increase INCREMENTS)")

Show explanation

The bug is a race condition caused by unsynchronized read-modify-write, so multiple threads updating a shared counter produce wrong totals. Teaches what a race condition is, why it is hard to reproduce, and how to use threading.Lock to fix it.

Multiprocessing Memory Model

Run this script and compare the contents of the shared list before and after the worker processes run. Did the workers modify the list you passed in?

import multiprocessing

results = []


def collect(value):
    results.append(value)


if __name__ == "__main__":
    processes = [
        multiprocessing.Process(target=collect, args=(i,))
        for i in range(5)
    ]
    for p in processes:
        p.start()
    for p in processes:
        p.join()

    print(f"results: {results}")
    print(f"expected: [0, 1, 2, 3, 4] in some order")
    print()
    print("Each child process gets its own copy of memory.")
    print("Fix: use multiprocessing.Manager().list() or return values via a Queue.")

Show explanation

The bug is that each process has its own copy of memory, so changes made inside child processes are not visible in the parent's shared list. Teaches the difference between threading and multiprocessing memory models.

Mock Patched at Wrong Location

Run the test. Does it pass? Add a print statement inside the mock to check whether the mock is actually being called. Then look at the return value of the function under test.

def fetch_data():
    """Return records from the data store."""
    return ["alpha", "beta", "gamma"]

from mockpatch_source import fetch_data


def process():
    """Fetch records and return them uppercased."""
    return [item.upper() for item in fetch_data()]

from unittest.mock import patch

import mockpatch_user


def test_process():
    with patch("mockpatch_source.fetch_data", return_value=["mock", "data"]):
        result = mockpatch_user.process()

    print(f"Result:   {result}")
    print(f"Expected: ['MOCK', 'DATA']")
    assert result == ["MOCK", "DATA"], (
        f"Mock had no effect — got {result!r}\n"
        "Fix: patch 'mockpatch_user.fetch_data' (where it is used), "
        "not 'mockpatch_source.fetch_data' (where it is defined)"
    )


if __name__ == "__main__":
    try:
        test_process()
        print("PASS")
    except AssertionError as e:
        print(f"FAIL: {e}")

Show explanation

The bug is patching where the function is defined instead of where it is imported, so unittest.mock.patch has no effect on the code under test. Teaches how Python's import system works and where mocks must be applied.