Terms

Back in 1975, Fred Brooks wrote:

Show me your flowcharts and conceal your tables, and I shall continue to be mystified; show me your tables and I won’t usually need your flowcharts: they’ll be obvious.

Along the same lines, telling me the terms that someone needs to know in order to understand something is a quick and dirty way to figure out what a lesson about that thing needs to cover. I have therefore gone through two dozen empirical studies on software engineering and pulled out the terms they use that computer science undergraduates are unlikely to know. It’s an intimidating list, but if we want to teach software engineers how to apply data science to software engineering problems and understand empirical software engineering research, I think we’ll have to cover most of it.

See below the table for the papers these terms were found in.

accuracy
alternative hypothesis
Amdahl's Law
analysis of variance
Bayes' Rule
Benjamini-Hochberg p-value correction
Bernoulli distribution
Bessel correction
binomial distribution
Bonferroni correction
box-and-whisker plot
central moment
Chebyshev's Inequality
chi-square test
Cliff's δ
Cohen's d
Cohen's kappa
conditional probability
confidence interval
continuity correction
convergence
correlation coefficient
covariance
covariance matrix
cumulative distribution function
dataframe
degrees of freedom
dependent variable
descriptive statistics
effect size
expected value
explanatory variable
F-measure
F-test
false negative
false positive
Gamma distribution
Gamma function
geometric distribution
goal-question-metric
Greenhouse-Geisser correction
harmonic mean
histogram
independent variable
interquartile range
Kano scale
Kruskal-Wallis test
Likert scale
linear regression
logistic regression
long tail
Mann-Whitney U test
Mauchly's test for sphericity
maximum likelihood estimation
mean
median
method of moments
multiple linear regression
n-gram analysis
negative binomial distribution
negative binomial regression
Noble's Rules
Not a Number
normal distribution
nuisance factor
null hypothesis
one-sided distribution
outlier
overdispersion
quartile
p hacking
p value
Poisson distribution
pooled sample variance
population
population moment
power law distribution
precision
principal component analysis
probability density function
probability mass function
quartile
rank correlation
recall
response variable
sample
sample moment
sample variance
Shapiro-Wilk test
sigmoidal curve
Spearman's rank correlation
standard deviation
standard normal distribution
standard uniform distribution
statistic
statistical model
t-distribution
t-test
tidy data
uniform distribution
variance
variance
violin plot
Wilcoxon rank-sum test
Wilcoxon signed rank test
z-test
Zipf's Law
Zipf-Mandelbrot distribution

The papers are:

Updated: