Glossary

A

Area under the curve (AUC): The area enclosed between the ROC curve and the x-axis, computed using the trapezoidal rule as $\text{AUC} \approx \sum_i \tfrac{1}{2}(\text{HR}i + \text{HR}})|\text{FARi - \text{FAR}|$; AUC = 0.5 for chance performance (the diagonal) and AUC = 1.0 for perfect discrimination; it equals the probability that a randomly chosen signal trial receives a higher evidence score than a randomly chosen noise trial.
Authorship attribution: The task of identifying which of several candidate authors wrote an anonymous text by comparing its stylistic features (such as character n-gram frequencies) to profiles built from texts of known authorship; used in forensic linguistics and literary studies to resolve questions of disputed or pseudonymous authorship.
Agent-based model: A simulation in which autonomous agents follow local rules and interact with one another; emergent population-level patterns arise from many individual interactions rather than being specified directly. Used in ecology, economics, and epidemiology to study complex systems that cannot easily be described by closed-form equations.

B

Bag of words: A document representation that records only which words appear and how often, discarding word order and position; sufficient for topic modeling because topics are defined by co-occurrence patterns rather than syntactic structure.
Basic reproduction number: Denoted $R_0$, the average number of secondary infections caused by a single infectious individual in a fully susceptible population; if $R_0 > 1$ an epidemic can grow, if $R_0 \leq 1$ it dies out. For the SIR model, $R_0 = \beta / \gamma$.
Binary mask: A 2-D array of true/false (or 1/0) values indicating which pixels satisfy a criterion such as exceeding an intensity threshold; used as input to connected-component labelling.

C

Cohen's kappa: A chance-corrected measure of inter-rater agreement: $\kappa = (P_o - P_e) / (1 - P_e)$, where $P_o$ is the proportion of items on which raters agree and $P_e$ is the agreement expected under independence. $\kappa = 0$ at chance, $\kappa = 1$ at perfect agreement, $\kappa < 0$ when observed agreement falls below chance.
Confidence interval (parameter): An interval constructed from data such that, over many repetitions of the experiment, the true parameter value falls within the interval at the stated rate (e.g., 95%); for nonlinear least squares the interval is formed as $\hat{\theta} \pm z \cdot \text{SE}(\hat{\theta})$ where $\text{SE}(\hat{\theta}) = \sqrt{\text{diag}(\text{pcov})}$ and $z = 1.96$ for a 95% CI under asymptotic normality.
Contingency table (raters): A $K \times K$ integer matrix whose entry $C_{ij}$ counts the number of observations for which rater A assigned category $i$ and rater B assigned category $j$; diagonal entries are agreements and off-diagonal entries are disagreements; used to compute Cohen's kappa.
Criterion (signal detection): The threshold on the internal evidence axis above which an observer responds "signal present"; $c = -\frac{1}{2}(\Phi^{-1}(\text{HR}) + \Phi^{-1}(\text{FAR}))$; $c = 0$ for an unbiased observer, $c > 0$ for a conservative observer, and $c < 0$ for a liberal observer.
Leave-one-out cross-validation: A model-evaluation procedure in which each observation is held out in turn, the model is fitted on the remaining observations, and the prediction error for the held-out point is recorded; averaging over all held-out errors estimates prediction skill on new data without requiring a separate test set.
Core point: In DBSCAN, a point whose $\varepsilon$-neighborhood contains at least min_samples points (including itself); core points are the seeds from which clusters are expanded by recursively adding all $\varepsilon$-reachable neighbors.
Cost surface: A raster (2-D grid) in which each cell stores the difficulty or energy cost of moving through that location; derived from terrain attributes such as elevation, slope, or land cover. Used in least-cost path analysis to convert a geographic landscape into a weighted graph suitable for Dijkstra's algorithm.
Character n-gram: A sequence of $n$ consecutive characters in a text, including spaces and punctuation; for a text of length $L$ there are $L - n + 1$ overlapping n-grams. Character n-grams capture unconscious stylistic patterns such as preferred letter combinations and word-boundary habits, making them useful for authorship attribution.
Collapsed Gibbs sampling: An MCMC algorithm for Bayesian models in which some latent variables (here, the document-topic and topic-word distributions) are analytically integrated out before sampling; the remaining variables (topic assignments) are sampled iteratively. "Collapsed" refers to the reduction in the number of explicit variables, which improves mixing speed and simplifies implementation.
Cosine similarity: A measure of the angle between two vectors: $\text{sim}(\mathbf{a}, \mathbf{b}) = (\mathbf{a}\cdot\mathbf{b}) / (|\mathbf{a}||\mathbf{b}|)$; ranges from 0 (orthogonal, no shared features) to 1 (identical direction). Used to compare n-gram frequency profiles because it is independent of text length: profiles with the same relative frequencies score 1 regardless of how many tokens produced them.
Cumulative sum (CUSUM) statistic: $C_t = \sum_{s=0}^{t} \hat{e}_s$, the running total of OLS residuals; used to detect structural breaks because a shift in the series mean causes residuals to share a systematic sign, producing a drift in $C_t$ that reverses direction at the break. The estimated break location is the index where $|C_t|$ is largest.
Censoring: In survival analysis, an observation is censored when the event of interest has not yet occurred by the end of the study or when a participant leaves before the study ends; the true event time is unknown but is known to exceed the censoring time. Right-censoring (the most common form) occurs when participants are lost to follow-up or the study concludes.
Confusion matrix: A $2\times2$ (or larger) table that cross-tabulates true class labels against predicted class labels; for binary classification it contains true negatives (TN), false positives (FP), false negatives (FN), and true positives (TP), enabling computation of precision, recall, and F1 score alongside overall accuracy.
Connected component: A maximal set of pixels (or grid cells) that are all above a threshold and are connected to one another through shared edges or corners; each component is treated as a single object.
Cross-entropy loss: A loss function for classification: $-[y\log p + (1-y)\log(1-p)]$ for binary problems, where $y$ is the true label and $p$ is the predicted probability; penalises confident wrong predictions more heavily than uncertain ones.
Covariance matrix: A symmetric matrix whose diagonal entries are the variances of fitted parameters and whose off-diagonal entries are the covariances between pairs of parameters; returned by least-squares fitting routines to quantify parameter uncertainties.

D

d-prime (d'): The sensitivity index in signal detection theory: $d' = \Phi^{-1}(\text{HR}) - \Phi^{-1}(\text{FAR})$, where $\Phi^{-1}$ is the inverse standard normal CDF; $d'$ equals the separation between the signal and noise distributions in standard deviation units; $d' = 0$ at chance and $d' > 0$ when the observer can discriminate signal from noise.
DBSCAN: Density-Based Spatial Clustering of Applications with Noise; a clustering algorithm that groups points by local density rather than distance to a centroid. A point is a core point if at least min_samples points lie within radius $\varepsilon$; clusters expand from core points through chains of $\varepsilon$-neighbors; points unreachable from any core point are labelled noise ($-1$).
Dijkstra's algorithm: A shortest-path algorithm that processes nodes in order of increasing accumulated cost from a source, using a priority queue; at each step it relaxes the edges of the cheapest unvisited node and stops when the destination is reached. For a grid with $V$ cells and $E$ edges, runtime is $O((V + E)\log V)$ with a binary heap.
Diachronic analysis: The study of how a linguistic feature (such as word frequency or grammatical construction) changes across time; contrasted with synchronic analysis, which examines a single point in time. In computational linguistics, diachronic analysis typically involves aggregating text by period and tracking normalized frequencies over time.
Dirichlet distribution: A distribution over probability vectors (vectors whose entries are non-negative and sum to 1); parameterised by a concentration vector $\boldsymbol{\alpha}$ whose magnitude controls how concentrated the samples are. Small $\alpha_k$ values produce sparse distributions (most mass on one component); large values produce uniform distributions. The Dirichlet is the conjugate prior for the categorical distribution and is used in LDA to model both document-topic and topic-word distributions.
Depth-first search: A graph traversal algorithm that explores as far as possible along each branch before backtracking, implemented with a stack (iteratively) or recursion; used here to test whether a path of open cells connects the top row to the bottom row of a grid.
Diffusion equation: The partial differential equation $\partial c/\partial t = D\,\partial^2 c/\partial x^2$ that describes how a dissolved substance or heat spreads through a medium at a rate proportional to the curvature of its concentration profile.
Distance matrix: A square symmetric matrix $D$ where entry $D(i,j)$ stores the evolutionary (or other pairwise) distance between taxa $i$ and $j$; the input to tree-reconstruction algorithms.
Dynamic programming: An algorithmic strategy that solves a problem by breaking it into overlapping subproblems, computing each subproblem once, and storing the results in a table to avoid redundant work.

E

Euler-Mascheroni constant: The constant $\gamma_E = \lim_{n\to\infty}(\sum_{k=1}^n 1/k - \ln n) \approx 0.5772$; appears in the mean of the Gumbel extreme-value distribution as $\mathrm{E}[X] = \mu + \gamma_E \beta$.
False alarm rate: The proportion of noise-only trials on which an observer responds "signal present": $\text{FAR} = P(\text{yes} \mid \text{noise}) = 1 - \Phi(c)$; also called the false positive rate or $1 - \text{specificity}$.
Extreme value distribution: A family of distributions that arise as limiting distributions of sample maxima or minima; the Gumbel (Type I) member is commonly used to model annual maximum river flows and other environmental extremes.

F

Flow network: A directed graph in which each edge carries a capacity or weight representing the rate at which material (water, traffic, data) moves from source to destination; used here to model pollutant transport in a river system.
First integral: A function of the state variables whose value is constant along every trajectory of a dynamical system; also called a conserved quantity or invariant.

G

Gini coefficient: A scalar measure of inequality ranging from 0 (perfect equality) to $(n-1)/n$ (one agent holds all wealth). For sorted wealth values $w_{(1)} \leq \cdots \leq w_{(n)}$: $G = (2\sum_{i=1}^n i\,w_{(i)} - (n+1)\sum w) / (n\sum w)$. Geometrically, $G$ is twice the area between the Lorenz curve and the 45-degree line of equality.
Gradient descent: An iterative optimisation algorithm that updates parameters in the direction of the negative gradient of the loss function; at each step, $\theta \leftarrow \theta - \eta \nabla L$ where $\eta$ is the learning rate. For convex losses such as binary cross-entropy, gradient descent converges to the global minimum.
Gaussian filter: A smoothing operation that replaces each pixel (or grid point) with a weighted average of its neighbours, where the weights follow a Gaussian (bell-curve) shape centred on the point.
Gumbel probability paper: A plotting technique that rescales the horizontal axis by the reduced variate $y = -\ln(-\ln p)$, so that data following a Gumbel distribution plot as a straight line; departures from linearity indicate poor fit.
Global alignment: An alignment that spans both sequences from end to end, inserting gaps wherever necessary to align every character; the Needleman-Wunsch algorithm finds the highest-scoring global alignment using dynamic programming.
Ghost cell: An extra grid point added outside the physical boundary of a numerical simulation solely to store a boundary value; it allows interior update formulas to be applied uniformly at the edge without special-casing.

H

Hit rate: The proportion of signal trials on which an observer correctly responds "signal present": $\text{HR} = P(\text{yes} \mid \text{signal}) = 1 - \Phi(c - d')$; also called the true positive rate or sensitivity (not to be confused with $d'$ sensitivity).
Hamming distance: The number of positions at which two sequences of equal length differ; when normalized by the sequence length, it gives the fraction of positions that differ. Used in stemma reconstruction as a measure of how many variant loci separate two manuscripts, and in information theory to measure the number of bit errors between transmitted and received strings.
Heat flux: The rate of heat energy transfer per unit area through a surface, measured in watts per square metre (W m⁻²); in steady-state conduction it equals $-k\,dT/dx$ and is uniform throughout any composite wall.

I

Inter-rater agreement: The degree to which two or more independent raters assign the same category to the same observations; raw percent agreement is easy to compute but is inflated by chance agreement; Cohen's kappa corrects for this inflation and is the standard measure in psychology, clinical research, and annotation.
Inverse-distance weighting (IDW): A spatial interpolation method that estimates the value at a query point as a weighted average of observed station values, where each station's weight is $d_i^{-p}$ ($d_i$ is the distance from the query point, $p$ is the power parameter); larger $p$ concentrates influence on the nearest stations.

J

Jacobi iteration: An iterative method for solving systems of equations in which each unknown is updated to the value that would satisfy its equation if all other unknowns are held at their previous values; updates are applied simultaneously to all unknowns in each sweep. For the steady-state heat equation on a 1D grid, each interior node is replaced by the conductivity-weighted average of its two neighbours, and sweeps continue until successive solutions differ by less than a prescribed tolerance.

K

K-means clustering: An iterative algorithm that partitions $n$ data points into $K$ clusters by alternating between two steps: assigning each point to its nearest centroid (using Euclidean distance) and updating each centroid as the coordinate-wise mean of all points assigned to it. The algorithm stops when assignments no longer change. Because initialization is random, results can vary across runs; the number of clusters $K$ must be chosen in advance.
Term frequency (TF): The count of how many times each word in the vocabulary appears in a given document, stored as a vector with one entry per vocabulary word. Raw TF counts are proportional to document length, so longer documents produce larger vectors; dividing by the document's L2 norm (or total token count) gives a length-normalized vector that reflects the shape of word use rather than total volume.
KD-tree: A binary space-partitioning tree that organises $k$-dimensional points by recursively splitting along alternating coordinate axes. Supports efficient range queries: all points within a given distance of a query point can be retrieved in $O(\log n)$ time on average, making it well-suited for the epsilon-neighborhood lookups required by DBSCAN.
Kaplan-Meier estimator: A non-parametric estimator of the survival function that accounts for censored observations by updating only at observed event times: $\hat{S}(t) = \prod_{t_i \leq t}(1 - d_i/n_i)$ where $d_i$ is the number of events and $n_i$ the number at risk at time $t_i$.

L

Landscape archaeology: A subfield of archaeology that studies the spatial distribution of human activity across the physical environment; methods include least-cost path analysis (reconstructing movement corridors), viewshed analysis, and site-catchment analysis.
Least-cost path: The route through a cost surface that minimises total accumulated travel cost from a source to a destination; found by applying Dijkstra's algorithm to a grid in which each cell's value represents movement difficulty. Paths favour valleys and passes over ridges when elevation is used as the cost proxy.
Learning curve: A plot of performance (such as reaction time or error rate) against the amount of practice (trial number or hours of training); empirically follows a power law: $\text{RT}(n) = A \cdot n^{-b}$, where $A$ is the initial performance and $b > 0$ is the learning rate exponent.
Latent Dirichlet Allocation (LDA): A generative probabilistic model (Blei, Ng, Jordan 2003) for text corpora in which each document is represented as a mixture of topics and each topic is a distribution over words; both mixtures are drawn from Dirichlet priors with concentration parameters $\alpha$ (document-topic) and $\beta$ (topic-word). Topics and mixtures are latent (unobserved) and must be inferred from the observed words.
Local alignment: An alignment that finds the highest-scoring matching subsequence anywhere within two sequences, rather than aligning them from end to end; used to locate conserved domains within longer sequences.
Lorenz curve: A plot of the cumulative share of total wealth held by the bottom $x$ fraction of the population (sorted from poorest to richest); a perfectly equal distribution produces the 45-degree diagonal, while any inequality bows the curve below it. The Gini coefficient equals twice the area between the Lorenz curve and the diagonal.
Log-log regression: A linear regression of $\ln y$ on $\ln x$; the OLS slope directly estimates the exponent of an underlying power-law relationship $y = A x^b$, and is used in economics to estimate price elasticity of demand.
Log-normal distribution: A probability distribution for a positive random variable $X$ such that $\ln X$ follows a normal distribution; characterised by its log-mean $\mu_y$ and log-standard-deviation $\sigma_y$, and widely used in hydrology to model annual maximum river flows.
Logistic regression: A classification model that estimates the probability of class membership using a linear combination of features passed through the sigmoid function; fitted by minimising binary cross-entropy loss with gradient descent.
Lomb-Scargle periodogram: FIXME
Lotka-Volterra model: A pair of ordinary differential equations describing the coupled dynamics of a prey population and a predator population, producing oscillating cycles when the system is displaced from its equilibrium.

M

Moving-Average Type-Token Ratio (MATTR): A length-independent measure of vocabulary richness: TTR is computed over each overlapping window of fixed width $w$ and the results are averaged. Because every window has the same length, MATTR can fairly compare texts of different total sizes, unlike the global TTR which falls as text length increases.
Moore neighbourhood: The set of up to 8 cells immediately surrounding a cell in a 2-D grid: the 4 horizontal/vertical neighbours and 4 diagonal neighbours. Cells on the grid boundary have fewer than 8 neighbours. Used in the Schelling model and in cellular automata to define local interactions.
Method of moments: A parameter-estimation technique that sets the distribution's theoretical moments (mean, variance, etc.) equal to their sample counterparts and solves for the unknown parameters; simpler to compute than maximum likelihood but generally less statistically efficient.
Moving average: A smoothing filter that replaces each value in a series with the mean of a fixed-width window of neighbouring values, reducing point-to-point noise variance by a factor equal to the window width.

N

Noise point: In DBSCAN, a point that is not a core point and is not within distance $\varepsilon$ of any core point; labelled $-1$ and not assigned to any cluster. Noise points correspond to outliers or isolated observations that do not belong to any dense region.
Normalized frequency: A word's count in a period divided by the total token count for the same period, expressing the word's share of the corpus rather than its raw occurrence count. Normalized frequencies sum to 1.0 within each period and are comparable across periods of different sizes, unlike raw counts.
N-gram profile: The relative frequency distribution of all observed n-grams in a text or collection of texts; constructed by counting every n-gram and dividing by the total count. As a probability distribution it can be compared between authors using cosine similarity or other distance measures.
Neighbor-joining: A distance-based algorithm (Saitou and Nei, 1987) that reconstructs a phylogenetic tree by iteratively joining the pair of taxa with the smallest corrected distance, running in $O(N^3)$ time.
Needleman-Wunsch algorithm: A dynamic programming algorithm (Needleman and Wunsch, 1970) for global sequence alignment. The first row and column of the scoring matrix are initialised to cumulative gap penalties, and the recurrence fills remaining cells as the maximum of a diagonal (match/mismatch) move, an upward (gap in the second sequence) move, or a leftward (gap in the first sequence) move.
Non-maximum suppression: A post-processing step that scans a list of candidate detections and discards any candidate that is not the tallest within a specified neighbourhood, preventing one broad peak or blob from being reported as multiple detections.

O

Ordinary least squares: A method for fitting a linear model $y = Xb$ by minimising the sum of squared residuals $|y - Xb|^2$; the closed-form solution is $\hat{b} = (X^TX)^{-1}X^Ty$, and the slope uncertainty is quantified by the standard error $\text{SE}(\hat{b}) = \sqrt{\hat{\sigma}^2(X^TX)^{-1}}$.

P

Price elasticity of demand: The percentage change in quantity demanded for a 1% change in price: $\varepsilon = \partial \ln Q / \partial \ln P$; values below $-1$ indicate elastic demand (quantity is highly responsive to price) and values between $-1$ and $0$ indicate inelastic demand.
Power law: A mathematical relationship of the form $y = A x^b$; on a log-log plot the relationship appears as a straight line with slope $b$. In learning-curve analysis $b > 0$ is the learning rate exponent, quantifying how rapidly performance improves with practice.
Percolation threshold: The critical probability $p_c$ at which a large random network transitions from almost certainly disconnected to almost certainly connected; for 2D site percolation on a square lattice with 4-neighbor connectivity, $p_c \approx 0.5927$.
Phase portrait: A plot of one state variable against another (e.g., predator vs. prey population) showing the family of trajectories a dynamical system can follow, without reference to time.
Phylogenetic tree: A branching diagram showing the inferred evolutionary relationships among a set of taxa, with leaves representing observed taxa and internal nodes representing their common ancestors.

Q

Q-matrix: A corrected distance matrix used in the neighbor-joining algorithm where each entry $Q(i,j) = (n-2)D(i,j) - \sum_k D(i,k) - \sum_k D(j,k)$ removes the bias introduced by branches of unequal length.

R

ROC curve: Receiver Operating Characteristic curve; a plot of hit rate against false-alarm rate as the decision criterion varies from very conservative to very liberal; summarized by the area under the curve (AUC), which equals the probability that a randomly chosen signal trial produces higher internal evidence than a randomly chosen noise trial; AUC = 0.5 at chance and AUC = 1.0 at perfect sensitivity.
Right-censoring: A form of censoring in survival analysis where the true event time is known only to exceed the observation time; arises when a study ends before all participants experience the event or when participants withdraw. The Kaplan-Meier estimator handles right-censored data without bias.
Rolling window: A fixed-width buffer of the $w$ most recent observations used to compute local summary statistics (mean, variance) that adapt to gradual changes in the signal; also called a moving window or sliding window.
Radial velocity: The component of a star's velocity along the observer's line of sight; measured via Doppler shifts in spectral lines and used to detect the gravitational wobble caused by an orbiting planet.
Reduced variate: The dimensionless quantity $y = -\ln(-\ln p)$ used in extreme-value analysis; it linearises the Gumbel CDF so that data following a Gumbel distribution plot as a straight line against the corresponding quantiles on Gumbel probability paper.
Return period: Also called recurrence interval; the average time between events that exceed a given threshold. A $T$-year event has probability $1/T$ of being exceeded in any single year and probability $1-(1-1/T)^T \approx 63\%$ of being exceeded at least once in $T$ years.

S

Signal detection theory: A framework for separating an observer's perceptual sensitivity from their response tendency by modeling the internal decision variable as two overlapping normal distributions (signal and noise) and fitting both the hit rate and false-alarm rate jointly; the key measures are $d'$ (sensitivity) and criterion $c$ (response bias).
Spatial interpolation: The estimation of a continuous spatial field (such as temperature or precipitation) at unsampled locations from observations at a discrete set of known locations; methods include inverse-distance weighting, kriging, and splines; all rely on the principle of spatial autocorrelation.
Spatial clustering: The task of partitioning geographic point locations into groups whose members are spatially proximate to each other, without specifying the number of groups in advance; used in landscape archaeology, ecology, and epidemiology to identify concentrations of sites, organisms, or cases.
Stemma: The family tree of manuscript copies of a historical text, showing which copies derive from which others; internal nodes represent lost intermediate ancestors and leaves represent surviving manuscripts. In classical philology the stemma is used to reconstruct the most likely original reading when manuscripts disagree.
Stemma reconstruction: The computational task of recovering the copying tree of a set of manuscripts from patterns of shared variant readings; analogous to phylogenetic tree reconstruction in evolutionary biology, using the same neighbor-joining algorithm applied to a Hamming-distance matrix rather than a DNA substitution matrix.
Schelling model: An agent-based simulation of residential segregation (Schelling, 1971) in which agents on a grid move to random empty cells whenever fewer than a threshold fraction of their occupied neighbours share their type; even mild thresholds (around 30%) produce strong large-scale segregation, illustrating that macro-level patterns can emerge from micro-level rules without any deliberate collective intent.
Structural break: A point in time at which the statistical properties of a time series (mean, variance, or trend) shift abruptly and permanently; also called a change point or regime change. Failing to account for a break leads to biased parameter estimates and unreliable forecasts when a regression is fitted over a period that straddles the break.
Sigmoid function: The function $\sigma(z) = 1/(1 + e^{-z})$ that maps any real number to the open interval $(0,1)$; used in logistic regression to convert a linear score into a probability.
SIR model: A compartmental epidemic model that divides a fixed population into Susceptible, Infectious, and Recovered compartments; individuals move irreversibly from S to I at rate $\beta SI/N$ and from I to R at rate $\gamma I$, producing a single epidemic wave when $R_0 = \beta/\gamma > 1$.
Site percolation: A percolation model in which each site (cell) of a lattice is independently declared open with probability $p$; the question is whether open sites form a connected path from one boundary to the other. Contrast with bond percolation, where edges rather than sites are opened.; The function $\sigma(z) = 1/(1 + e^{-z})$ that maps any real number to the open interval $(0,1)$; used in logistic regression to convert a linear score into a probability.
Semi-amplitude: The quantity $K$ in a sinusoidal radial-velocity model, equal to the maximum speed of the star relative to the centre of mass of the star-planet system; related to the planet's minimum mass and orbital period.
Smith-Waterman algorithm: A dynamic-programming algorithm (Smith and Waterman, 1981) that computes the optimal local alignment of two sequences by filling a scoring matrix with a recurrence that includes a zero floor, allowing alignments to start anywhere.
Spike outlier: An isolated measurement that is far from the local baseline and immediately returns to normal; contrasted with a step change, which is a sustained shift. A rolling z-score detector flags the spike reading but quickly returns to normal z-scores once the spike leaves the window.
Step change: A sudden, sustained shift in the baseline level of a time series; the rolling-window z-score detector flags readings near the transition but adapts once the window has fully moved into the new regime.
Survival function: The probability that a randomly chosen individual has not yet experienced the event of interest by time $t$: $S(t) = P(T > t)$; always starts at 1, is non-increasing, and approaches 0 as $t \to \infty$ if all individuals eventually experience the event.

T

Type-token ratio (TTR): The number of distinct word types divided by the total number of word tokens in a text: $\text{TTR} = V/T$. A value near 1 indicates high lexical diversity (few repetitions); a value near 0 indicates heavy repetition. TTR is sensitive to text length — it decreases for longer texts even when richness is constant — so MATTR is preferred for cross-text comparisons.
Transfer matrix: A square matrix $T$ where entry $T[i,j]$ gives the fraction of material at node $j$ that moves to node $i$ in one time step; when every column sums to 1 (column-stochastic), the total amount of material is exactly conserved at every step.
Thermal conductivity: A material property $k$ (W m⁻¹ K⁻¹) measuring how readily heat flows through a material; high $k$ (e.g. metals) allows large heat flux for a small temperature gradient.
Thermal resistance: For a uniform layer of thickness $L$ and conductivity $k$, the resistance $R = L/k$ (m² K W⁻¹) represents the temperature drop per unit heat flux; layers in series add.
Tridiagonal system: A linear system $A\mathbf{x} = \mathbf{b}$ in which $A$ has nonzero entries only on the main diagonal and the two diagonals immediately above and below it; arises from 1D finite-difference discretization and can be solved in $O(n)$ time.
Truncation error: The error introduced at each time step by approximating continuous derivatives with finite differences; for the explicit diffusion scheme the global truncation error scales as $O(\Delta t + \Delta x^2)$.

U

UPGMA: Unweighted Pair Group Method with Arithmetic Mean; a distance-based algorithm that reconstructs a rooted phylogenetic tree by iteratively merging the pair of taxa (or clusters) with the smallest average pairwise distance, placing each new node at height $D(i,j)/2$. UPGMA assumes a molecular clock (constant evolutionary rate across all lineages); when rates vary, the recovered topology may be incorrect.

V

Variant locus: A specific position in a text where at least two manuscripts have different readings; the collection of variant loci across a set of manuscripts provides the character data used to compute pairwise Hamming distances for stemma reconstruction.
von Neumann stability condition: FIXME

W

Word shift: A change in the relative frequency of a word across time periods; a positive shift means the word's share of tokens rose, a negative shift means it fell. Quantified as the OLS slope of normalized frequency against decade index.

X

Y

Z

Zipf's law: The empirical regularity (Zipf, 1949) that the $k$-th most frequent word in a large corpus appears roughly $1/k$ times as often as the most frequent word; equivalently, word frequency is approximately proportional to $1/\text{rank}$. The resulting distribution has a heavy tail: a few words are extremely common and a large number of words are rare. Used here to generate synthetic text corpora that mimic the statistical properties of natural language.
Z-score: The number of standard deviations by which an observation deviates from a reference mean: $z = (x - \bar{x}) / \hat{\sigma}$; used in anomaly detection to flag readings that are unusually far from the local rolling mean.