Fitting a Learning Curve to Experimental Data

The Problem

Why is a power-law model used for learning curves rather than a linear model?

Because a power-law always has a lower sum of squared residuals than a line.
Wrong: which model fits better depends on the data; linear models can outperform power laws when the true relationship is linear.
Because a power-law captures the empirical observation that early practice
produces large gains and later practice produces smaller but continuing gains.
Correct: the power law RT = A * n^(-b) is a convex decreasing function whose slope flattens with increasing n, matching the diminishing-returns pattern observed in motor and cognitive skill learning.
Because a power-law guarantees that reaction times reach zero at large n.
Wrong: A * n^(-b) approaches zero asymptotically but never reaches it; real reaction times never fall to zero because every response requires some irreducible motor and cognitive processing time.
Because power-laws are easier to fit than lines with scipy.
Wrong: nonlinear fitting is more complex than linear fitting; the power-law is used because it describes the data well, not for computational convenience.

The Power-Law Learning Model

If A = 400 ms and b = 0.5, what is the predicted reaction time (in ms) on trial 4?

Generating Synthetic Trial Data

def make_trials(
    n_trials=N_TRIALS,
    a=TRUE_A,
    b=TRUE_B,
    noise_sd=NOISE_SD,
    seed=SEED,
):
    """Return a Polars DataFrame of per-trial reaction times.

    Columns:
        trial -- integer trial number starting at 1
        rt    -- reaction time in milliseconds

    Reaction times follow the power-law model RT = a * trial^(-b) with
    Gaussian noise of standard deviation noise_sd.  Any simulated RT below
    1 ms is clamped to 1 ms to ensure all values are positive.
    """
    rng = np.random.default_rng(seed)
    trials = np.arange(1, n_trials + 1, dtype=float)
    rt = a * trials ** (-b) + rng.normal(0.0, noise_sd, n_trials)
    rt = np.maximum(rt, 1.0)
    return pl.DataFrame({"trial": list(range(1, n_trials + 1)), "rt": rt.tolist()})

Fitting with curve_fit

def fit_power_law(trials, rt):
    """Fit a power-law learning curve to per-trial reaction-time data.

    Parameters
    ----------
    trials : array-like of trial numbers (1, 2, ..., N)
    rt     : array-like of reaction times in milliseconds

    Returns
    -------
    a    : fitted amplitude (predicted RT on trial 1, in ms)
    b    : fitted learning rate exponent (larger means faster improvement)
    b_ci : (lower, upper) 95% confidence interval for b

    scipy.optimize.curve_fit minimises the sum of squared residuals and returns
    the estimated parameter covariance matrix.  The standard error of b is the
    square root of the [1, 1] diagonal entry (its variance); multiplying by
    CI_Z = 1.96 gives the half-width of the approximate 95% CI.
    """
    trials = np.asarray(trials, dtype=float)
    rt = np.asarray(rt, dtype=float)
    p0 = [rt[0], 0.2]
    popt, pcov = curve_fit(power_law, trials, rt, p0=p0, maxfev=5000)
    a, b = popt
    se_b = float(np.sqrt(pcov[1, 1]))
    b_ci = (b - CI_Z * se_b, b + CI_Z * se_b)
    return float(a), float(b), b_ci

Confidence Interval from the Covariance Matrix

$$b \pm 1.96 \cdot \text{SE}(b), \qquad \text{SE}(b) = \sqrt{\text{pcov}[1,1]}$$

Order the steps to fit a power-law learning curve and report a confidence interval for b.

Choose initial parameter guesses p0 = [first RT, 0.2]. Call curve_fit(power_law, trials, rt, p0=p0) to obtain popt and pcov. Extract the fitted exponent: b = popt[1]. Compute the standard error: SE = sqrt(pcov[1, 1]). Form the 95% CI: (b - 1.96 * SE, b + 1.96 * SE).

Visualizing the Fit

def plot_fit(trials, rt, a, b, b_ci, filename):
    """Save a scatter plot of raw data with the fitted power-law curve overlaid."""
    trials = np.asarray(trials, dtype=float)
    rt = np.asarray(rt, dtype=float)
    curve_n = np.arange(1, int(trials.max()) + 1, dtype=float)
    curve_rt = power_law(curve_n, a, b)

    raw = (
        alt.Chart(
            alt.Data(
                values=[{"trial": int(t), "rt": float(r)} for t, r in zip(trials, rt)]
            )
        )
        .mark_point(color="steelblue", opacity=0.6)
        .encode(
            x=alt.X("trial:Q", title="Trial number"),
            y=alt.Y("rt:Q", title="Reaction time (ms)"),
        )
    )

    fitted = (
        alt.Chart(
            alt.Data(
                values=[
                    {"trial": float(t), "rt": float(r)}
                    for t, r in zip(curve_n, curve_rt)
                ]
            )
        )
        .mark_line(color="firebrick", strokeWidth=2)
        .encode(
            x="trial:Q",
            y="rt:Q",
        )
    )

    chart = (raw + fitted).properties(
        title=(f"Power-law fit: b = {b:.3f}  (95% CI [{b_ci[0]:.3f}, {b_ci[1]:.3f}])"),
        width=480,
        height=300,
    )
    chart.save(filename)
A scatter plot with trial number on the x-axis and reaction time in milliseconds on the y-axis. Blue points show the noisy per-trial observations, decreasing from around 500 ms at trial 1 to around 280 ms by trial 80. A red power-law curve fits through the data, closely following the decreasing trend.
Figure 1: Per-trial reaction times (blue points) and the fitted power-law learning curve (red line) for 80 synthetic trials. The fitted exponent b = 0.298 is close to the true value of 0.30; the 95% confidence interval [0.278, 0.318] brackets the true value.

Testing

Trial count

All reaction times positive

Trial numbers

Fitted exponent close to true value

CI brackets true exponent

Predicted RT decreasing

import numpy as np
from generate_learning import make_trials, N_TRIALS, TRUE_B
from learning import fit_power_law, power_law


def test_trial_count():
    # Generator must produce exactly N_TRIALS rows.
    df = make_trials()
    assert df.height == N_TRIALS


def test_all_rt_positive():
    # All reaction times must be positive; values below 1 ms are clamped.
    df = make_trials()
    assert (df["rt"] > 0).all()


def test_trial_numbers():
    # Trial numbers must run from 1 to N_TRIALS without gaps.
    df = make_trials()
    assert df["trial"].to_list() == list(range(1, N_TRIALS + 1))


def test_fitted_exponent_close_to_true():
    # With 80 trials and noise SD = 20 ms the fitted exponent should recover
    # TRUE_B to within 0.10.  The signal range is about 210 ms (500 to 290 ms
    # over 80 trials) while the noise SD is 20 ms, giving moderate SNR; with
    # 80 data points curve_fit converges well within 0.10 of the true value.
    df = make_trials()
    _, b, _ = fit_power_law(df["trial"].to_numpy(), df["rt"].to_numpy())
    assert abs(b - TRUE_B) < 0.10


def test_ci_brackets_true_exponent():
    # The 95% confidence interval returned by fit_power_law must contain TRUE_B.
    df = make_trials()
    _, _, b_ci = fit_power_law(df["trial"].to_numpy(), df["rt"].to_numpy())
    assert b_ci[0] <= TRUE_B <= b_ci[1]


def test_predicted_rt_decreasing():
    # The fitted curve must be strictly decreasing: performance always improves
    # with more practice when b > 0 and a > 0.
    df = make_trials()
    a, b, _ = fit_power_law(df["trial"].to_numpy(), df["rt"].to_numpy())
    predicted = power_law(np.arange(1, 21, dtype=float), a, b)
    assert all(predicted[i] > predicted[i + 1] for i in range(len(predicted) - 1))

Learning curve key terms

Learning curve
A plot of performance (such as reaction time or error rate) against practice amount (trial number or cumulative experience); the power-law learning curve RT = A * n^(-b) shows diminishing returns, with large gains early and smaller gains as practice continues
Power law
A mathematical relationship of the form y = A * x^b; on a log-log plot the relationship is linear with slope b; in learning-curve analysis b > 0 quantifies the rate at which performance improves with practice
Confidence interval (parameter)
An interval constructed from data such that, over many repetitions of the experiment, the true parameter value falls within the interval at the stated rate (e.g. 95%); for curve_fit the interval is formed as b +/- z * SE where SE = sqrt(pcov[b, b]) and z = 1.96 for a 95% CI under asymptotic normality

Exercises

Log-log linearity

Take the logarithm of both trial and rt from the synthetic data and fit a straight line using numpy.polyfit. Compare the slope of the line to the exponent $b$ returned by curve_fit. Are they equal? If not, explain why the two estimates differ.

Effect of noise level

Re-run the generator and fitter with noise standard deviations of 5, 20, 50, and 100 ms. For each noise level record the fitted $b$, the width of the 95% CI for $b$, and whether the CI brackets TRUE_B. At what noise level does the CI first fail to contain the true value?

Confidence interval coverage

Generate 200 independent datasets using different random seeds and fit the power-law model to each. Record what fraction of the 95% confidence intervals contain TRUE_B. Is the empirical coverage close to 95%? What does it mean if the coverage is systematically lower?

Alternative model

Fit an exponential learning model $\text{RT}(n) = c + (A - c)\,e^{-\lambda n}$, where $c$ is the asymptotic RT, $A$ is the initial RT, and $\lambda$ is the decay rate. Compare the sum of squared residuals of the exponential and power-law fits on the synthetic data. Which model fits better?