Percolation in a Porous Medium

The Problem

The Grid Model

GRID_SIZE = 20  # side length of the square grid (cells)
N_TRIALS = 200  # independent trials per probability value
N_PROBS = 41  # number of probability values from 0.0 to 1.0 inclusive
RNG_SEED = 7493418

# Theoretical site-percolation threshold for a 2D square lattice with
# 4-neighbor (von Neumann) connectivity.  From Stauffer & Aharony,
# Introduction to Percolation Theory (2nd ed., 1994), Appendix B.
THRESHOLD_THEORY = 0.5927
def make_grid(p, rng):
    """Return a GRID_SIZE x GRID_SIZE boolean array.

    Each cell is independently True (open) with probability p.
    """
    return rng.random((GRID_SIZE, GRID_SIZE)) < p

The expression rng.random((GRID_SIZE, GRID_SIZE)) < p produces a boolean grid. What fraction of its cells are True on average?

$1 - p$
Wrong: cells with a uniform random value less than p are True with probability p, not 1-p.
$p$
Correct: each cell is drawn from Uniform(0, 1) and is less than p with probability p.
$p^2$
Wrong: the cells are independent; their joint probability involves products only when all must be True simultaneously.
$\sqrt{p}$
Wrong: the comparison < p gives probability p directly, with no square root.

Depth-First Search

def percolates(grid):
    """Return True if an open path connects any top-row cell to any bottom-row cell.

    Uses iterative depth-first search with 4-neighbor (von Neumann) connectivity.
    All open cells in row 0 are added to the stack as starting points.
    The search succeeds as soon as any visited cell lies in the last row.
    """
    nrows, ncols = grid.shape
    visited = np.zeros_like(grid, dtype=bool)

    stack = [(0, c) for c in range(ncols) if grid[0, c]]
    for r, c in stack:
        visited[r, c] = True

    while stack:
        r, c = stack.pop()
        if r == nrows - 1:
            return True
        for dr, dc in ((-1, 0), (1, 0), (0, -1), (0, 1)):
            nr, nc = r + dr, c + dc
            if (
                0 <= nr < nrows
                and 0 <= nc < ncols
                and grid[nr, nc]
                and not visited[nr, nc]
            ):
                visited[nr, nc] = True
                stack.append((nr, nc))
    return False

Put these steps in the correct order for the iterative DFS percolation test.

  1. Collect all open cells in row 0 and push them onto the stack; mark them as visited
  2. Pop a cell (r, c) from the stack
  3. If r equals the last row index, return True (percolation found)
  4. For each unvisited open 4-neighbor, mark it visited and push it onto the stack
  5. If the stack is empty and the last row was never reached, return False

Why must cells be marked as visited before they are popped from the stack, rather than after?

Marking after popping is a valid alternative that produces the same result.
Wrong: marking after popping allows the same cell to be pushed multiple times, making the algorithm O(N^4) instead of O(N^2).
Marking before popping prevents a cell from being pushed onto the stack more than once.
Correct: once a cell is marked visited, no future neighbor scan will add it again, bounding the stack size by the number of cells.
The visited array must be filled in top-to-bottom order to work correctly.
Wrong: the visited array is indexed by (row, column), not traversal order; order of marking does not matter.
Marking after popping would cause the algorithm to miss cells in the last row.
Wrong: the check r == nrows - 1 happens after popping; the order of marking does not affect which cells are checked.

Sweeping the Probability

def sweep():
    """Return a DataFrame with the percolation fraction at each probability.

    For each p in a uniform grid from 0 to 1, runs N_TRIALS independent
    experiments and records the fraction that percolate.  All trials share
    a single RNG to ensure reproducibility from RNG_SEED.
    """
    rng = np.random.default_rng(RNG_SEED)
    probs = np.linspace(0.0, 1.0, N_PROBS)
    fractions = [
        sum(percolates(make_grid(p, rng)) for _ in range(N_TRIALS)) / N_TRIALS
        for p in probs
    ]
    return pl.DataFrame({"probability": probs, "fraction_percolating": fractions})
def estimate_threshold(df):
    """Return the smallest p where the percolation fraction first reaches 0.5.

    On a finite grid the percolation transition is sharp but not a true step
    function.  The crossing of 0.5 is a conventional and reproducible
    estimator of the finite-size threshold.
    """
    above = df.filter(pl.col("fraction_percolating") >= 0.5)
    return float(above["probability"].min()) if not above.is_empty() else float("nan")

Visualizing the Results

def plot(df):
    """Return an Altair line chart of percolation fraction vs. open probability."""
    return (
        alt.Chart(df)
        .mark_line(point=True)
        .encode(
            x=alt.X("probability:Q", title="Open-cell probability p"),
            y=alt.Y(
                "fraction_percolating:Q",
                title="Fraction of trials that percolate",
                scale=alt.Scale(domain=[0.0, 1.0]),
            ),
        )
        .properties(width=400, height=300, title="Percolation threshold sweep")
    )
S-shaped curve: fraction percolating vs. open probability. The curve rises steeply near p = 0.59.
Figure 1: Percolation fraction vs. open-cell probability p on a 20 x 20 grid, 200 trials per point. The transition is centred near the theoretical threshold p_c = 0.5927.

Testing

Boundary cases

Fraction in $[0, 1]$

Threshold near theory

import numpy as np
from porous import (
    RNG_SEED,
    THRESHOLD_THEORY,
    estimate_threshold,
    make_grid,
    percolates,
    sweep,
)


def test_empty_grid_never_percolates():
    # When p = 0 every cell is closed (False), so the stack is empty from
    # the start and the DFS returns False immediately.
    rng = np.random.default_rng(RNG_SEED)
    grid = make_grid(0.0, rng)
    assert not percolates(grid)


def test_full_grid_always_percolates():
    # When p = 1 every cell is open (True).  A straight vertical path
    # through column 0 connects row 0 to row GRID_SIZE - 1.
    rng = np.random.default_rng(RNG_SEED)
    grid = make_grid(1.0, rng)
    assert percolates(grid)


def test_single_row_grid():
    # A one-row grid percolates if and only if at least one cell in that row
    # is open, because the top row and the bottom row are the same row.
    open_row = np.array([[True, False, True]])
    assert percolates(open_row)
    closed_row = np.array([[False, False, False]])
    assert not percolates(closed_row)


def test_fractions_in_unit_interval():
    # Each trial either percolates or not, so the fraction must lie in [0, 1].
    df = sweep()
    assert (df["fraction_percolating"] >= 0.0).all()
    assert (df["fraction_percolating"] <= 1.0).all()


def test_threshold_near_theory():
    # On a 20 x 20 grid with 200 trials the finite-size threshold should be
    # within 0.07 of the infinite-lattice value 0.5927.  Finite-size effects
    # and Monte Carlo variance both contribute; a tolerance of 0.07 gives a
    # comfortable margin without being so wide that a wrong algorithm passes.
    df = sweep()
    p_c = estimate_threshold(df)
    assert abs(p_c - THRESHOLD_THEORY) < 0.07, (
        f"estimated threshold {p_c:.3f} is too far from theory {THRESHOLD_THEORY:.4f}"
    )

The theoretical percolation threshold for 2D site percolation on a square lattice with 4-neighbor connectivity is commonly cited as 0.5927. To three decimal places, what is THRESHOLD_THEORY as defined in porous.py?

Exercises

Larger grids

Increase GRID_SIZE to 50 and rerun sweep (you may need to reduce N_TRIALS to 50 to keep the run time reasonable). Does the threshold estimate get closer to 0.5927? Does the transition curve become steeper?

8-neighbor connectivity

Change the neighbor offsets in percolates from the 4-neighbor pattern to 8-neighbor (Moore neighborhood): add the four diagonal directions $(\pm 1, \pm 1)$. The theoretical threshold for 8-neighbor site percolation is approximately 0.407. Update THRESHOLD_THEORY and verify that test_threshold_near_theory still passes.

Bond percolation

In bond percolation, edges between adjacent cells are open with probability $p$ (rather than cells themselves). Implement a make_bond_grid function that creates a random boolean array for horizontal bonds and another for vertical bonds. Modify percolates to use bond arrays instead of a cell array. The theoretical threshold for 2D bond percolation on a square lattice is exactly $p_c = 0.5$. Estimate it numerically and compare.

Cluster size distribution

Instead of returning only a boolean, modify percolates to return the sizes of all connected open clusters found during the DFS. At $p = p_c$, the largest cluster grows as a power law of the grid size $N$. Run sweeps at several grid sizes and plot the largest cluster size vs. $N$ near the threshold.