Signal Detection Theory Analysis

The Problem

In a detection experiment, an observer responds "yes" or "no" to each stimulus
Some stimuli contain a signal, others do not
Raw accuracy conflates two separate factors
- The observer's ability to detect the signal
- Their tendency to say "yes" regardless
Framing the outcome as a confusion matrix keeps the two factors separate

	Observer says "yes"	Observer says "no"
Signal present	Hit	Miss
Signal absent	False alarm	Correct rejection

The hit rate (HR) is the proportion of signal trials on which the observer responds "yes"
- I.e, hits divided by total signal trials
The false alarm rate (FAR) is the proportion of noise trials on which the observer incorrectly responds "yes"
- False alarms divided by total noise trials

An observer runs 100 signal trials and 100 noise trials. They record 70 hits and 30 false alarms. What are their hit rate and false alarm rate?

HR = 0.70, FAR = 0.30: Correct: HR = 70/100 = 0.70 and FAR = 30/100 = 0.30.
HR = 0.70, FAR = 0.70: Wrong: the false alarm count (30) is divided by the number of noise trials (100), giving 0.30, not 0.70.
HR = 70, FAR = 30: Wrong: hit rate and false alarm rate are proportions between 0 and 1, not raw counts.
HR = 0.30, FAR = 0.70: Wrong: hits and false alarms have been swapped; hits come from signal trials and false alarms come from noise trials.

Computing Hit Rate and False Alarm Rate

def confusion_rates(labels, decisions):
    """Return (hit_rate, false_alarm_rate) from binary label and decision arrays.

    labels    -- 1-D array of 1 (signal) or 0 (noise) for each trial
    decisions -- 1-D array of 1 (responded yes) or 0 (responded no)

    hit_rate        = hits / total signal trials
    false_alarm_rate = false alarms / total noise trials
    """
    labels = np.asarray(labels)
    decisions = np.asarray(decisions)
    signal_trials = labels == 1
    noise_trials = labels == 0
    hits = np.sum((decisions == 1) & signal_trials)
    false_alarms = np.sum((decisions == 1) & noise_trials)
    hit_rate = hits / np.sum(signal_trials)
    false_alarm_rate = false_alarms / np.sum(noise_trials)
    return float(hit_rate), float(false_alarm_rate)

labels is an array of 1 (signal) and 0 (noise) for each trial.
decisions is an array of 1 (responded yes) and 0 (responded no).
The function counts hits and false alarms then divides by the appropriate total number of trials

Order the steps to compute the hit rate from experiment data.

Count the number of trials on which a signal was present. Count the number of those signal trials on which the observer responded "yes" (hits). Divide the hit count by the total number of signal trials.

The ROC Curve as a Threshold Sweep

An observer does not simply say "yes" or "no"
- They have an internal numeric evidence score for each trial and say "yes" when that score exceeds a decision threshold
By varying the threshold, we trace out different (FAR, HR) pairs
- A very high threshold means only very strong evidence triggers a "yes"
  - Few false alarms, but also few hits
  - A conservative observer
- A very low threshold means almost any evidence triggers a "yes"
  - Many hits, but also many false alarms
  - A liberal observer
The ROC curve (Receiver Operating Characteristic) is the set of all (FAR, HR) pairs an observer can achieve by adjusting the threshold
The diagonal line FAR = HR represents chance performance
- I.e., the observer gains no extra hits without an equal increase in false alarms
A curve that bows toward the upper-left corner means that the observer can achieve high hit rates with low false alarm rates, i.e., better discrimination

def roc_curve(scores, labels):
    """Return (far, hr) arrays tracing the ROC curve from evidence scores.

    scores -- 1-D array of numeric evidence values, one per trial
    labels -- 1-D array of 1 (signal) or 0 (noise) for each trial

    The threshold sweeps over all unique score values plus a value just
    above the maximum so that the curve starts near (0, 0).  At each
    threshold, a trial is classified as "yes" when its score is >= threshold.
    The curve runs from near (0, 0) at the highest threshold to (1, 1) at
    the lowest, tracing all (FAR, HR) pairs the observer can achieve.
    """
    scores = np.asarray(scores, dtype=float)
    labels = np.asarray(labels)
    # Thresholds: from just above max down to min, covering the full range.
    thresholds = np.sort(np.unique(scores))[::-1]
    # Prepend a threshold above every score so the curve starts at (0, 0).
    top = np.array([thresholds[0] + 1.0])
    thresholds = np.concatenate([top, thresholds])

    n_signal = np.sum(labels == 1)
    n_noise = np.sum(labels == 0)

    far = np.empty(len(thresholds))
    hr = np.empty(len(thresholds))
    for i, t in enumerate(thresholds):
        decisions = (scores >= t).astype(int)
        hits = np.sum((decisions == 1) & (labels == 1))
        fa = np.sum((decisions == 1) & (labels == 0))
        hr[i] = hits / n_signal
        far[i] = fa / n_noise

    return far, hr

scores contains the numeric evidence value for each trial
labels contains 1 for signal trials and 0 for noise trials
The function sweeps over all unique score values as candidate thresholds from highest (most conservative) to lowest (most liberal)
At each threshold, a trial is classified as "yes" when its score is at or above the threshold

Match each threshold choice to its likely effect on hit rate and false alarm rate.

Very high threshold: Low hit rate and low false alarm rate (conservative: the observer rarely responds).
Very low threshold: High hit rate and high false alarm rate (liberal: the observer almost always responds).
Threshold at the midpoint of all scores: Intermediate hit rate and false alarm rate (moderate operating point).

Area Under the ROC Curve

Any single (FAR, HR) pair depends on the threshold chosen, which may vary between observers or experiments
The area under the ROC curve (AUC) summarizes performance across all thresholds with a single number
AUC = 0.5: the ROC is the diagonal, i.e., chance performance
AUC = 1.0: the ROC passes through (0, 1), i.e., perfect discrimination
Interpretation: AUC equals the probability that a randomly chosen signal trial receives a higher evidence score than a randomly chosen noise trial
The trapezoidal rule approximates AUC from the arrays of (FAR, HR) points:

$$\text{AUC} \approx \sum_i \tfrac{1}{2}(\text{HR}_i + \text{HR}_{i+1}) \cdot |\text{FAR}_i - \text{FAR}_{i+1}|$$

Each term is the area of a trapezoid whose parallel sides are $\text{HR}i$ and $\text{HR}$ and whose width is the step in FAR
- A Riemann sum
As the number of threshold steps increases, the sum converges to the true area

def auc(far, hr):
    """Return the area under the ROC curve using the trapezoidal rule.

    The trapezoidal rule approximates the area as a sum of trapezoids:

        AUC = sum_i  0.5 * (HR_i + HR_{i+1}) * |FAR_i - FAR_{i+1}|

    This is a Riemann sum that converges to the true AUC as the number
    of threshold steps increases.  AUC = 0.5 for chance performance
    (the diagonal) and AUC = 1.0 for perfect discrimination.
    """
    far = np.asarray(far, dtype=float)
    hr = np.asarray(hr, dtype=float)
    # Sort by FAR so the trapezoidal sum goes left to right.
    order = np.argsort(far)
    sorted_far = far[order]
    sorted_hr = hr[order]
    widths = np.abs(np.diff(sorted_far))
    heights = 0.5 * (sorted_hr[:-1] + sorted_hr[1:])
    return float(np.sum(widths * heights))

An observer's evidence scores are identical for signal and noise trials, so their ROC curve follows the diagonal exactly. What is their AUC?

Visualizing the ROC Curve

def plot_roc(roc_far, roc_hr, filename):
    """Save an ROC curve plot as an SVG file."""
    curve_data = [{"far": float(f), "hr": float(h)} for f, h in zip(roc_far, roc_hr)]
    diag_data = [{"far": 0.0, "hr": 0.0}, {"far": 1.0, "hr": 1.0}]

    base = alt.Chart().encode(
        x=alt.X("far:Q", title="False alarm rate", scale=alt.Scale(domain=[0, 1])),
        y=alt.Y("hr:Q", title="Hit rate", scale=alt.Scale(domain=[0, 1])),
    )
    curve = base.mark_line(color="steelblue", strokeWidth=2).properties(
        data=alt.Data(values=curve_data)
    )
    diagonal = base.mark_line(strokeDash=[4, 4], color="gray").properties(
        data=alt.Data(values=diag_data)
    )

    chart = (curve + diagonal).properties(
        title="ROC curve (threshold sweep)",
        width=360,
        height=360,
    )
    chart.save(filename)

A square plot with false alarm rate on the x-axis and hit rate on the y-axis, both ranging from 0 to 1. A blue curve bows toward the upper-left corner above the gray diagonal chance line. — Figure 1: ROC curve produced by sweeping over score thresholds (blue). The gray dashed diagonal represents chance performance (AUC = 0.5). The bowing of the curve above the diagonal indicates the observer can discriminate signal from noise.

Testing

Hit rate is 1.0 when all signal trials are detected
- If the observer responds "yes" to every signal trial, hits equal total signal trials, so HR = 1.0
False alarm rate is 0.0 when no noise trial triggers a response
- If the observer never responds "yes" on noise trials, false alarms = 0, so FAR = 0.0
Rates are proportional to counts
- With 3 hits out of 4 signal trials and 1 false alarm out of 2 noise trials, HR = 0.75 and FAR = 0.5
ROC starts at the origin
- At the threshold above every score, no trial is classified as "yes", so HR = 0 and FAR = 0
ROC ends at (1, 1)
- At the threshold below every score, every trial is classified as "yes", so HR = 1 and FAR = 1
ROC is monotonically increasing
- Lowering the threshold can only keep or increase both HR and FAR, never decrease either
ROC passes through (0, 1) for perfectly separable scores
- When every signal score exceeds every noise score, one threshold admits all signals and no noise, placing a point at FAR = 0, HR = 1
AUC of the diagonal is 0.5
- The diagonal ROC (FAR = HR) represents chance performance, so its area is exactly half the unit square
AUC of a perfect step is 1.0
- A step from (0, 0) to (0, 1) to (1, 1) encloses the entire unit square
AUC is above 0.5 for separable scores
- When signal scores are on average higher than noise scores, the ROC bows above the diagonal and AUC > 0.5
AUC does not depend on the order of FAR values supplied
- The implementation sorts FAR internally, so reversing the input arrays gives the same result

import numpy as np
import pytest
from sdt import confusion_rates, roc_curve, auc


# ---------------------------------------------------------------------------
# confusion_rates
# ---------------------------------------------------------------------------

def test_hit_rate_all_hits():
    # Every signal trial is detected: hit rate must be 1.0.
    labels = [1, 1, 1, 0, 0]
    decisions = [1, 1, 1, 0, 0]
    hr, far = confusion_rates(labels, decisions)
    assert hr == pytest.approx(1.0)


def test_false_alarm_rate_zero():
    # No noise trial triggers a false alarm: FAR must be 0.0.
    labels = [1, 0, 0, 0]
    decisions = [1, 0, 0, 0]
    _, far = confusion_rates(labels, decisions)
    assert far == pytest.approx(0.0)


def test_hit_and_false_alarm_rates_proportional():
    # 3 out of 4 signal trials are hits; 1 out of 2 noise trials is a false alarm.
    labels = [1, 1, 1, 1, 0, 0]
    decisions = [1, 1, 1, 0, 1, 0]
    hr, far = confusion_rates(labels, decisions)
    assert hr == pytest.approx(0.75)
    assert far == pytest.approx(0.5)


# ---------------------------------------------------------------------------
# roc_curve
# ---------------------------------------------------------------------------

def test_roc_starts_at_origin():
    # The first point (highest threshold) should classify nothing as signal,
    # so both FAR and HR are 0.
    scores = [0.1, 0.5, 0.9, 0.2, 0.8]
    labels = [1,   1,   1,   0,   0  ]
    far, hr = roc_curve(scores, labels)
    assert far[0] == pytest.approx(0.0)
    assert hr[0] == pytest.approx(0.0)


def test_roc_ends_at_one():
    # The last point (lowest threshold) classifies everything as signal,
    # so both FAR and HR are 1.
    scores = [0.1, 0.5, 0.9, 0.2, 0.8]
    labels = [1,   1,   1,   0,   0  ]
    far, hr = roc_curve(scores, labels)
    assert far[-1] == pytest.approx(1.0)
    assert hr[-1] == pytest.approx(1.0)


def test_roc_monotonically_increasing():
    # Both FAR and HR must be non-decreasing across threshold steps.
    rng = np.random.default_rng(7493418)
    scores = np.concatenate([rng.standard_normal(50), rng.standard_normal(50) + 1.5])
    labels = np.concatenate([np.zeros(50, dtype=int), np.ones(50, dtype=int)])
    far, hr = roc_curve(scores, labels)
    assert all(far[i] <= far[i + 1] for i in range(len(far) - 1))
    assert all(hr[i] <= hr[i + 1] for i in range(len(hr) - 1))


def test_roc_perfect_scores():
    # When every signal score exceeds every noise score the ROC passes
    # through (0, 1): at the threshold that admits all signals but no noise,
    # FAR = 0 and HR = 1.
    scores = [2.0, 3.0, 4.0, 0.5, 1.0]
    labels = [1,   1,   1,   0,   0  ]
    far, hr = roc_curve(scores, labels)
    # The point (FAR=0, HR=1) must appear somewhere in the curve.
    assert any(f == pytest.approx(0.0) and h == pytest.approx(1.0) for f, h in zip(far, hr))


# ---------------------------------------------------------------------------
# auc
# ---------------------------------------------------------------------------

def test_auc_chance_diagonal():
    # The diagonal ROC (FAR = HR everywhere) has AUC = 0.5.
    far = np.linspace(0, 1, 101)
    hr = np.linspace(0, 1, 101)
    assert auc(far, hr) == pytest.approx(0.5, abs=1e-6)


def test_auc_perfect():
    # A step from (0, 0) to (0, 1) to (1, 1) encloses the full unit square: AUC = 1.0.
    far = np.array([0.0, 0.0, 1.0])
    hr = np.array([0.0, 1.0, 1.0])
    assert auc(far, hr) == pytest.approx(1.0, abs=1e-6)


def test_auc_above_chance_for_separable_scores():
    # When signal scores are generally higher than noise scores, AUC > 0.5.
    rng = np.random.default_rng(7493418)
    noise = rng.standard_normal(200)
    signal = rng.standard_normal(200) + 1.5
    scores = np.concatenate([noise, signal])
    labels = np.concatenate([np.zeros(200, dtype=int), np.ones(200, dtype=int)])
    far, hr = roc_curve(scores, labels)
    area = auc(far, hr)
    assert area > 0.5


def test_auc_symmetric():
    # auc should not depend on whether FAR is supplied in ascending or
    # descending order, because the implementation sorts internally.
    far = np.array([0.0, 0.25, 0.5, 0.75, 1.0])
    hr = np.array([0.0, 0.6,  0.8, 0.9,  1.0])
    assert auc(far, hr) == pytest.approx(auc(far[::-1], hr[::-1]), abs=1e-10)

Signal detection key terms

Hit rate: The proportion of signal trials on which the observer correctly responds "yes": HR = hits / total signal trials; also called the true positive rate.
False alarm rate: The proportion of noise trials on which the observer incorrectly responds "yes": FAR = false alarms / total noise trials; also called the false positive rate.
ROC curve: The Receiver Operating Characteristic curve; a plot of hit rate against false-alarm rate as the decision threshold varies from very conservative to very liberal; produced by a threshold sweep over evidence scores.
Area under the curve (AUC): A summary of ROC performance computed using the trapezoidal rule; AUC = 0.5 for chance performance and AUC = 1.0 for perfect discrimination; equals the probability that a random signal trial receives a higher evidence score than a random noise trial.

Note on the Gaussian Model

The equal-variance Gaussian model summarizes performance compactly via $d' = \Phi^{-1}(\text{HR}) - \Phi^{-1}(\text{FAR})$, where $\Phi^{-1}$ is the inverse of the standard normal CDF
This assumes both the noise distribution and the signal distribution are normal with equal variance
- Only their means differ
The threshold-sweep ROC presented in this lesson makes no distributional assumptions
- It works for any numeric evidence score and any underlying distribution

Exercises

Compute AUC from a small example

Given evidence scores [0.1, 0.4, 0.35, 0.8] and labels [0, 0, 1, 1] (0 = noise, 1 = signal), trace through roc_curve by hand at each unique score threshold. Compute the AUC using the trapezoidal rule. Check your answer against the function.

Effect of threshold on the confusion matrix

Using the synthetic data from generate_sdt.py, choose three thresholds: the 25th, 50th, and 75th percentile of all scores. For each threshold, compute HR and FAR and mark the corresponding point on the ROC curve. How does moving from a conservative threshold to a liberal threshold change the confusion matrix?

Comparing two observers

Observer A has evidence scores that follow N(0, 1) for noise and N(1.0, 1) for signal. Observer B has scores that follow N(0, 1) for noise and N(2.0, 1) for signal. Generate 200 trials for each observer (RNG seed 7493418), compute their ROC curves and AUCs, and plot both curves on the same axes. Which observer has the higher AUC and why?

Trapezoidal approximation error

The trapezoidal rule is exact only when the curve is piecewise linear. Generate a fine-grained ROC (1 000 signal and 1 000 noise trials) and a coarse-grained ROC (20 signal and 20 noise trials) from the same underlying score distributions. Compare their AUC estimates. How large is the approximation error in the coarse case?