Glossary

A

A/B testing
A controlled experiment in which participants or users are randomly assigned to one of two variants to measure which performs better on a defined metric.
alternative hypothesis
The hypothesis that an effect exists; denoted H₁. Accepted when the null hypothesis is rejected.
axial coding
A stage in grounded theory analysis in which open codes are grouped into higher-level themes and the relationships between categories are examined.

B

blinding
An experimental design feature in which participants (single-blind), experimenters, or both (double-blind) do not know which condition a participant is in, reducing expectation bias.
blocking variable
A variable known to affect the outcome but not of experimental interest; controlled by grouping experimental units into blocks of similar values before randomization, guaranteeing that the nuisance factor's effect is distributed evenly across treatment conditions.

C

change failure rate
The proportion of deployments that result in degraded service or require remediation; one of the four DORA metrics used to measure software delivery performance.
claim
An assertion about the world that a study attempts to support or refute through evidence.
Cohen's d
A measure of effect size for comparing two means, expressed in units of pooled standard deviation. Conventionally, d = 0.2 is small, d = 0.5 is medium, and d = 0.8 is large.
conclusion validity
The degree to which a study's statistical analysis is appropriate and adequately powered to detect the effect it reports; threatened by underpowered designs, violated test assumptions, and uncorrected multiple comparisons.
confirmation bias
The tendency to search for, interpret, and recall information in ways that confirm pre-existing beliefs, distorting the analysis of data.
confounding variable
A variable that is associated with both the independent variable and the outcome, potentially explaining an observed relationship without there being a direct causal link.
construct validity
The degree to which a measurement actually captures the concept it is intended to represent.
control group
The group in an experiment that does not receive the treatment; used as a baseline for comparison.
controlled experiment
A study in which one or more independent variables are manipulated while other factors are held constant, allowing causal claims to be made.
convenience sampling
Selecting participants based on ease of access rather than random selection, making the sample easier to recruit but harder to generalize from.

D

dependent variable
The outcome measured in an experiment; the variable expected to change in response to manipulation of the independent variable.
deployment frequency
How often an organization successfully deploys code to production; one of the four DORA metrics used to measure software delivery performance.
difference-in-differences
A quasi-experimental design that compares the change in outcome for a treated group to the change for a control group over the same period.
directed acyclic graph (DAG)
A diagram of nodes connected by directed edges with no cycles, used to represent causal assumptions about the relationships between variables.

E

effect size
A measure of the magnitude of an effect, independent of sample size. Examples include Cohen's d, odds ratio, and Pearson's r.
empirical software engineering
The subfield of software engineering that uses systematic data collection and analysis to study how software is built, maintained, and used.
evidence
Data or observations gathered through study to support or refute a claim, which varies in quality and reliability depending on how it was collected.
experience sampling method
A research technique in which participants are prompted at random or scheduled intervals during their normal activities to report their current task, state, or perceptions, capturing in-the-moment data rather than retrospective recall.
external validity
The degree to which study findings generalize beyond the specific sample, setting, and task studied.

F

file drawer problem
The tendency for null results to go unpublished, causing the published literature to overrepresent positive findings.
funnel plot
A scatter plot of effect size against sample size used in meta-analysis; asymmetry suggests publication bias.

G

gerund coding
A practice from grounded theory in which codes are written as verb phrases rather than nouns, capturing the actions, choices, and processes participants describe rather than labeling a static category.
Goal-Question-Metric (GQM)
A structured approach to defining measurements in which a study goal is decomposed into questions whose answers would indicate success, and each question is linked to a specific operationalized metric.
Goodhart's Law
The principle that when a measure becomes a target, it ceases to be a good measure, because people optimize the measured variable rather than the underlying goal it was meant to track.
grounded theory
A qualitative research methodology in which theory is developed inductively from data through iterative coding and constant comparison, rather than testing a predetermined hypothesis.

H

HARKing
Hypothesizing After Results are Known: presenting exploratory findings as if they had been predicted in advance, inflating false positive rates.

I

ignoring non-response
A bias that arises when researchers analyze only survey respondents without accounting for those who did not reply, which can skew results if non-respondents differ systematically from respondents.
independent variable
The variable that is manipulated or selected by the researcher to examine its effect on an outcome.
informed consent
A research ethics requirement that participants know what data is being collected, how it will be used, and that they can withdraw without penalty.
intercoder reliability
The degree of agreement between two or more researchers independently applying the same coding scheme to qualitative data. Commonly measured with Cohen's kappa.
internal survey
A survey administered within a single organization to collect data from employees, commonly used to measure team practices, tool adoption, or workplace satisfaction.
internal validity
The degree to which a study can support the conclusion that the treatment caused the observed outcome, as opposed to some other explanation.
interrupted time series
A quasi-experimental design that looks for a change in trend at the point when an intervention occurred, using data from before and after.

J

K

L

lead time for change
The elapsed time between a code change being committed and that change running in production; one of the four DORA metrics used to measure software delivery performance.
leading question
A question worded in a way that suggests or implies a particular answer, potentially biasing respondents' replies.
learning effect
A threat to internal validity in within-subjects designs: participants improve on a later task or condition simply through practice, making the later treatment appear more effective regardless of its merit.
Likert scale
A survey response format using ordered categories such as "Strongly agree" to "Strongly disagree," typically with 5 or 7 points.

M

meta-analysis
A statistical technique for combining results from multiple independent studies of the same question to estimate an overall effect size.
mining software repositories (MSR)
A research approach that extracts and analyzes data from version control systems, issue trackers, code review tools, and related sources.

N

natural experiment
A study that exploits real-world variation that approximates random assignment, without the researcher directly manipulating any variable.
novelty effect
A threat to internal validity in which any new tool, method, or process receives a temporary performance boost because participants are motivated by its novelty, inflating the apparent benefit of the intervention.
null hypothesis
The hypothesis that there is no effect; denoted H₀. Statistical testing evaluates whether the data is inconsistent with the null hypothesis.

O

observational study
A study in which the researcher measures variables as they naturally occur, without manipulating any conditions.
open coding
The initial stage in qualitative data analysis in which the researcher reads through data and attaches descriptive labels (codes) to segments of text.
overgeneralization
Drawing conclusions that extend beyond what the data actually support, typically by ignoring important differences in context, population, or sample.

P

p-hacking
Trying multiple analyses or data subsets until p < 0.05 is achieved, then reporting only that analysis, inflating the false positive rate.
p-value
The probability of observing data at least as extreme as the observed data, assuming the null hypothesis is true. Not the probability that the null hypothesis is true.
pre-registration
A practice in which researchers publicly commit to their hypotheses and analysis plan before collecting data, reducing the risk of HARKing and p-hacking.
proxy metric
A measurable quantity used as a stand-in for a concept that is harder to measure directly.
publication bias
The tendency for journals to publish studies with statistically significant results more often than studies with null results.

Q

qualitative methods
Research approaches that collect and analyze non-numerical data such as interviews, observations, and documents in order to understand meaning, context, and experience.
quantitative methods
Research approaches that collect and analyze numerical data to describe patterns, test hypotheses, or estimate effect sizes.

R

randomization
The random assignment of participants to experimental conditions, distributing known and unknown confounders evenly across groups.
retrospective analysis
An analysis of data collected before the research question was formulated, such as examining historical logs, commit records, or past surveys.

S

sampling strategy
The method used to select participants from a population. Strategies include convenience sampling, random sampling, and stratified sampling.
saturation
In qualitative research, the point at which collecting additional data no longer introduces new codes or themes.
selection bias
Bias introduced when the sample is not representative of the population the researcher intends to generalize to.
semi-structured interview
An interview that follows a predetermined set of questions but allows the interviewer to probe further or deviate based on participant responses.
statistical power
The probability that a study will detect an effect if one truly exists; conventionally targeted at 0.80.
structured interview
An interview in which every participant is asked exactly the same questions in the same order, enabling systematic comparison across respondents.
study
A systematic attempt to collect and analyze evidence in order to test a claim or answer a research question.
survivorship bias
Bias introduced by analyzing only cases that survived a selection process, ignoring cases that did not.

T

thematic analysis
A qualitative analysis method in which data is coded and grouped into themes through iterative passes.
time to restore service
The time it takes to recover normal operation after a failure in production; one of the four DORA metrics used to measure software delivery performance.
treatment
The intervention applied to the treatment group in an experiment.
triangulation
The use of multiple data sources, methods, or investigators to increase confidence in a finding.

U

unstructured interview
An interview with no fixed list of questions, guided entirely by the participant's responses, used to explore topics in depth without constraining the direction of conversation.

V

W

X

Y

Z