Statistical Reference

This appendix summarizes the statistical concepts used in the lessons. It is not a statistics course: it is a reference for practitioners who need to decode a methods section without getting a degree first. Read it when a lesson references something you have forgotten, or use it to check whether a study you are evaluating applied the right tool.

Descriptive Statistics

Probability and Distributions

Hypothesis Testing

Effect Sizes

Statistical Power and Sample Size

Multiple Comparisons

Common Tests and When to Use Them

Test When to use Assumptions
Two-sample t-test Compare means of two groups Normality, approximately equal variance
Mann-Whitney U Compare distributions of two groups None (non-parametric)
Paired t-test Compare means within pairs (before/after) Paired differences are normal
Wilcoxon signed-rank Paired comparison, non-parametric Symmetric differences
Chi-square Compare frequencies or proportions Expected count ≥ 5 per cell
Pearson correlation Linear relationship between two continuous vars Bivariate normality
Spearman correlation Monotone relationship; ordinal or skewed data None
Linear regression Model continuous outcome from predictors Linearity, normality of residuals, homoscedasticity

Correlation vs. Causation

Worked Example: Power Calculation

Here is a worked example to show how the formula for statistical power is used in practice.

Suppose you want to study whether pair programming reduces defect density. From prior studies, you know the standard deviation of defect density across projects is about 4 defects per thousand lines of code (σ = 4). You want to detect a difference of 3 defects per KLOC (δ = 3) — a meaningful reduction that would justify the cost of pair programming.

Using the formula n = 2(z_α + z_β)²σ²/δ²:

n = 2(1.96 + 0.84)² × 16 / 9 = 2 × (2.80)² × 16 / 9 = 2 × 7.84 × 16 / 9 = 250.88 / 9 ≈ 28

You need about 28 projects per condition — 56 in total. If you can only recruit 10 projects per condition, the study is underpowered: you might miss a real effect. The formula also shows that detecting smaller effects requires much larger samples: to detect δ = 1 (keeping σ = 4), you would need n = 2 × 7.84 × 16 / 1 ≈ 251 projects per condition.

In practice, the hardest part of a power calculation is estimating σ and choosing δ. If your estimate of σ is too low, you will underpower your study; if your δ is unrealistically small, you will conclude the study is infeasible when it may not be. When in doubt, use a range of plausible values and report the sensitivity of your sample size to those choices.