Cell Segmentation in Microscopy Images

The Problem

Representing a Fluorescence Image

$$I(x, y) = B \exp\!\left(-\frac{(x - x_c)^2 + (y - y_c)^2}{2\sigma^2}\right)$$

SEED = 7493418  # RNG seed
IMAGE_SIZE = 128  # pixels per side; kept small for fast tests
N_CELLS = 8  # number of cells to place
CELL_SIGMA = 6  # Gaussian half-width of each cell (pixels)
CELL_BRIGHTNESS = 1.0  # peak intensity (image values are in [0, 1])
NOISE_SCALE = 0.15  # std dev of additive Gaussian noise

# Minimum centre-to-centre separation that keeps midpoint intensity below the
# operating threshold of 0.5.  For two Gaussians of amplitude A and sigma σ
# separated by distance d, the midpoint value is 2A·exp(−d²/(8σ²)).
# Setting this below 0.5 gives d > 2σ·sqrt(2·ln(4A)) = 2·6·sqrt(2·ln(4)) ≈ 19.8.
# We use 24 pixels for a comfortable margin.
MIN_SEPARATION = 24
def make_image(
    n_cells=N_CELLS,
    sigma=CELL_SIGMA,
    brightness=CELL_BRIGHTNESS,
    noise_scale=NOISE_SCALE,
    image_size=IMAGE_SIZE,
    seed=SEED,
):
    """Return (image, centers) for a synthetic fluorescence image.

    Each cell is a 2-D Gaussian with peak `brightness` and width `sigma`.
    Independent Gaussian noise with std dev `noise_scale` is added to every
    pixel.  Pixel values are clipped to [0, 1].  `centers` is a list of
    (x, y) pixel coordinates of the true cell centres.
    """
    rng = np.random.default_rng(seed)
    centers = _place_cells(n_cells, image_size, MIN_SEPARATION, rng)

    ys, xs = np.mgrid[0:image_size, 0:image_size]
    image = np.zeros((image_size, image_size))
    for cx, cy in centers:
        image += brightness * np.exp(
            -0.5 * ((xs - cx) ** 2 + (ys - cy) ** 2) / sigma**2
        )
    image += rng.normal(0.0, noise_scale, image.shape)
    image = np.clip(image, 0.0, None)  # negative counts are unphysical
    return image, centers

Step 1: Smoothing

$$B_\text{smooth} = B \cdot \frac{\sigma^2}{\sigma^2 + \sigma_s^2}$$

def smooth(image, sigma=SMOOTH_SIGMA):
    """Return the image after 2-D Gaussian smoothing with the given sigma."""
    return ndimage.gaussian_filter(image.astype(float), sigma=sigma)

A cell has peak brightness $B = 1.0$ and Gaussian width $\sigma = 6$ pixels. After smoothing with a Gaussian filter of width $\sigma_s = 2$ pixels, the smoothed peak brightness is $B \cdot \sigma^2 / (\sigma^2 + \sigma_s^2)$. What is the result? Give your answer to two decimal places.

Step 2: Thresholding and Labelling

def segment(image, threshold=THRESHOLD, min_size=MIN_CELL_SIZE):
    """Return (labeled, n_cells) from a smoothed image.

    Steps:
    1. Threshold to a binary mask (pixels above `threshold` → True).
    2. Label connected components with scipy.ndimage.label.
    3. Discard components smaller than `min_size` pixels.

    `labeled` is an integer array where 0 is background and 1..n_cells
    are the detected cells.  Components are re-numbered consecutively
    after size filtering so the label values are always contiguous.
    """
    binary = image > threshold
    raw_labeled, n_raw = ndimage.label(binary)

    # Filter by size and re-number labels.
    filtered = np.zeros_like(raw_labeled)
    new_label = 1
    for old_label in range(1, n_raw + 1):
        if np.sum(raw_labeled == old_label) >= min_size:
            filtered[raw_labeled == old_label] = new_label
            new_label += 1

    return filtered, new_label - 1
def cell_sizes(labeled):
    """Return an array of pixel counts, one per detected cell."""
    n = labeled.max()
    return np.array([np.sum(labeled == i) for i in range(1, n + 1)])

Put these cell-segmentation steps in the correct order.

  1. Place Gaussian blobs at random non-overlapping positions
  2. Add Gaussian noise and clip pixel values to zero
  3. Apply 2-D Gaussian smoothing
  4. Threshold the smoothed image to produce a binary mask
  5. Label connected components with scipy.ndimage.label
  6. Discard components smaller than MIN_CELL_SIZE pixels

Choosing the Threshold

$$\sigma_\text{noise,smooth} = \frac{\sigma_\text{noise}}{2\sqrt{\pi}\,\sigma_s} \approx \frac{0.15}{2\sqrt{\pi}\cdot 2} \approx 0.021$$

SMOOTH_SIGMA = 2.0  # std dev of the Gaussian smoothing kernel (pixels)

# After smoothing, cell peak brightness ≈ σ²/(σ²+σ_s²) = 36/40 ≈ 0.90.
# Threshold 0.5 sits comfortably below 0.90 and well above the smoothed-noise
# ceiling (≈ 5× the smoothed-noise std dev of 0.021; see lesson for derivation).
THRESHOLD = 0.50

# Expected cell area at the operating threshold ≈ π·r_t² where
# r_t = sqrt(−2·ln(threshold/peak)) · σ_eff ≈ 1.085 · 6.32 ≈ 6.9 px, area ≈ 148 px².
# MIN_CELL_SIZE is set well below this to keep real cells while removing noise blobs.
MIN_CELL_SIZE = 30  # minimum pixel count for a detected cell

MIN_SEPARATION is set to 24 pixels rather than a smaller value. Why?

scipy.ndimage.label cannot separate components closer than 24 pixels
Wrong: label works at pixel level and separates any two components that do not touch.
At smaller separations the midpoint intensity between two adjacent cells exceeds the threshold, merging them into one component
Correct: two Gaussians 24 px apart have midpoint intensity ≈ 0.21, safely below the threshold of 0.50; closer cells would merge.
The Gaussian smoothing filter blurs cells together unless they are at least $4\sigma$ apart
Wrong: SMOOTH_SIGMA = 2, so $4\sigma = 8$ pixels — much less than 24; blurring is not the limiting factor.
24 pixels is the minimum required by the physical image resolution in microns
Wrong: the separation is derived from the threshold and cell brightness, not from physical calibration of the microscope.

What the Segmenter Finds in Pure Noise

def make_pure_noise(noise_scale=NOISE_SCALE, image_size=IMAGE_SIZE, seed=SEED):
    """Return an image containing only Gaussian noise, no cell signal.

    Used to demonstrate the false-positive behaviour of the segmentation
    pipeline when the threshold is set below the noise floor.
    """
    rng = np.random.default_rng(seed)
    return np.maximum(rng.normal(0.0, noise_scale, (image_size, image_size)), 0.0)
Pure-noise image with 4 false cell detections shown in colour.
Figure 1: Pure Gaussian noise (no cells) smoothed and segmented at threshold 0.05 with no size filter. Four spurious regions are labelled. The same image at the operating threshold of 0.50 produces zero detections.
Left: raw synthetic image with 8 Gaussian blobs. Right: segmented image with each cell in a different colour and true centres marked with red crosses.
Figure 2: Eight synthetic cells recovered by the segmentation pipeline. All 8 are found; cell areas range from 143 to 150 px², close to the analytic value of 148 px².

The noise demonstration shows 4 false cells at threshold 0.05 (no size filter) and 0 false cells at threshold 0.50. What roles do the threshold and MIN_CELL_SIZE play?

The size filter is the primary defence; the threshold only speeds up computation
Wrong: without an adequate threshold, large noise clusters pass the size filter too; the threshold does the main work.
The threshold and the size filter are equally important and interchangeable
Wrong: they address different failure modes — one filters by intensity, the other by area — and are not interchangeable.
The threshold is irrelevant as long as the size filter is strict enough
Wrong: a very low threshold lets many noise pixels cluster into large components that pass the size filter.
The threshold is the primary defence; the size filter removes small noise blobs that slip past a well-chosen threshold
Correct: at 0.50 $\approx 24\sigma$, noise exceedances are vanishingly rare; the size filter is a secondary backstop for the few that remain.

Testing

Noise reduction

Shape preservation

Constant image

Cell count

Cell sizes

Empty result above maximum

Pure noise

import numpy as np
from generate_image import make_image, make_pure_noise, N_CELLS, IMAGE_SIZE
from cellseg import smooth, segment, cell_sizes, THRESHOLD


def test_smooth_reduces_noise():
    # Gaussian smoothing must reduce pixel-to-pixel variance.
    rng = np.random.default_rng(7)
    noise = rng.normal(0, 1, (64, 64))
    assert smooth(noise, sigma=2).var() < noise.var()


def test_smooth_preserves_shape():
    image = np.ones((IMAGE_SIZE, IMAGE_SIZE))
    assert smooth(image).shape == image.shape


def test_smooth_constant_unchanged():
    # Smoothing a uniform image must leave interior values unchanged.
    image = np.full((50, 50), 0.7)
    result = smooth(image, sigma=2)
    interior = result[4:-4, 4:-4]
    assert np.allclose(interior, 0.7, atol=1e-6)


def test_segment_finds_all_cells():
    # With default parameters the pipeline must recover all N_CELLS cells.
    image, centers = make_image()
    labeled, n_found = segment(smooth(image))
    assert n_found == N_CELLS, f"Expected {N_CELLS} cells, found {n_found}"


def test_cell_sizes_reasonable():
    # Every detected cell must have a pixel area in the physically plausible range.
    # At the operating threshold the analytic area is ≈ 148 px²; we allow 50–350 px²
    # to accommodate smoothing and noise effects without being unnecessarily tight.
    image, _ = make_image()
    labeled, _ = segment(smooth(image))
    sizes = cell_sizes(labeled)
    assert np.all(sizes >= 50), f"Cell too small: {sizes.min()} px"
    assert np.all(sizes <= 350), f"Cell too large: {sizes.max()} px"


def test_threshold_above_max_finds_nothing():
    # A threshold above the maximum smoothed value must produce zero cells.
    image, _ = make_image()
    labeled, n_found = segment(smooth(image), threshold=999.0)
    assert n_found == 0


def test_pure_noise_false_positives():
    # At a threshold of 0.05, pure noise produces false detections.
    # Smoothed noise std dev ≈ NOISE_SCALE / (2·sqrt(π)·SMOOTH_SIGMA) ≈ 0.021;
    # 0.05 ≈ 2.4σ, so roughly 0.8% of pixels exceed it, forming several blobs.
    noise = make_pure_noise()
    _, n_false = segment(smooth(noise), threshold=0.05)
    assert n_false > 0, "Expected false positives from pure noise at threshold 0.05"

    # At the operating threshold (0.50 ≈ 24σ) pure noise must not trigger any cell.
    _, n_safe = segment(smooth(noise), threshold=THRESHOLD)
    assert n_safe == 0, "Operating threshold should reject all noise blobs"

Exercises

Watershed segmentation

The thresholding approach fails when two cells overlap and merge into a single connected component. Implement watershed segmentation: compute the distance transform of the binary mask (scipy.ndimage.distance_transform_edt), find local maxima of the distance map as seed points, and use scipy.ndimage.watershed_ift (or skimage.segmentation.watershed) to split overlapping regions. Demonstrate that it correctly separates two cells whose centres are only 14 pixels apart (closer than MIN_SEPARATION).

Intensity-based cell classification

Add a second cell type to make_image: dim cells with brightness=0.4 alongside the standard bright cells with brightness=1.0. Extend segment to return a label array and a list of per-cell mean intensities, then classify each detected cell as bright or dim using a simple intensity threshold. Report precision and recall for each class.

Effect of noise on count accuracy

Generate 50 images for each of five noise levels: $\sigma_\text{noise} \in {0.05, 0.10, 0.15, 0.20, 0.25}$. Run the full pipeline on each image and record the number of cells found. Plot mean and standard deviation of the cell count as a function of noise level and identify the noise level at which the pipeline begins to fail consistently.

Adaptive thresholding

Real microscopy images often have uneven illumination: the background is brighter in some regions than others. Simulate this by adding a low-frequency gradient $G(x,y) = 0.2 \cdot x / \text{IMAGE_SIZE}$ to the synthetic image before adding noise. Show that the global threshold misses cells in dark regions or over-detects in bright regions, then implement local thresholding (scipy.ndimage.generic_filter with a percentile-based local estimate) and show that it restores accurate cell counts.