Terms

Posted 2019-12-20

Back in 1975, Fred Brooks wrote:

Show me your flowcharts and conceal your tables, and I shall continue to be mystified; show me your tables and I won’t usually need your flowcharts: they’ll be obvious.

Along the same lines, telling me the terms that someone needs to know in order to understand something is a quick and dirty way to figure out what a lesson about that thing needs to cover. I have therefore gone through two dozen empirical studies on software engineering and pulled out the terms they use that computer science undergraduates are unlikely to know. It’s an intimidating list, but if we want to teach software engineers how to apply data science to software engineering problems and understand empirical software engineering research, I think we’ll have to cover most of it.

See below the table for the papers these terms were found in.

accuracy
alternative hypothesis
Amdahl's Law
analysis of variance
Bayes' Rule
Benjamini-Hochberg p-value correction
Bernoulli distribution
Bessel correction
binomial distribution
Bonferroni correction
box-and-whisker plot
central moment
Chebyshev's Inequality
chi-square test
Cliff's δ
Cohen's d
Cohen's kappa
conditional probability
confidence interval
continuity correction
convergence
correlation coefficient
covariance
covariance matrix
cumulative distribution function
dataframe
degrees of freedom
dependent variable
descriptive statistics
effect size
expected value
explanatory variable
F-measure
F-test
false negative
false positive
Gamma distribution
Gamma function
geometric distribution
goal-question-metric
Greenhouse-Geisser correction
harmonic mean
histogram
independent variable
interquartile range
Kano scale
Kruskal-Wallis test
Likert scale
linear regression
logistic regression
long tail
Mann-Whitney U test
Mauchly's test for sphericity
maximum likelihood estimation

mean
median
method of moments
multiple linear regression
n-gram analysis
negative binomial distribution
negative binomial regression
Noble's Rules
Not a Number
normal distribution
nuisance factor
null hypothesis
one-sided distribution
outlier
overdispersion
quartile
p hacking
p value
Poisson distribution
pooled sample variance
population
population moment
power law distribution
precision
principal component analysis
probability density function
probability mass function
quartile
rank correlation
recall
response variable
sample
sample moment
sample variance
Shapiro-Wilk test
sigmoidal curve
Spearman's rank correlation
standard deviation
standard normal distribution
standard uniform distribution
statistic
statistical model
t-distribution
t-test
tidy data
uniform distribution
variance
variance
violin plot
Wilcoxon rank-sum test
Wilcoxon signed rank test
z-test
Zipf's Law
Zipf-Mandelbrot distribution

The papers are:

Laurence Aitchison, Nicola Corradi, and Peter E. Latham: “Zipf’s Law Arises Naturally When There Are Underlying, Unobserved Variables”. PLOS Computational Biology, 12(12), Dec 2016, doi:10.1371/journal.pcbi.1005110.
Amjad Altadmri and Neil C.C. Brown: “37 Million Compilations”. In Proc. 46th ACM Technical Symposium on Computer Science Education - SIGCSE‘15, 2015, doi:10.1145/2676723.2677258. Summarizes the authors’ analysis of novice programming mistakes.
B.C.D. Anda, D.I.K. Sjoberg, and A. Mockus: “Variability and Reproducibility in Software Engineering: A Study of Four Companies that Developed the Same System”. IEEE Transactions on Software Engineering, 35(3), May 2009, doi:10.1109/tse.2008.89.
Andrew Begel and Thomas Zimmermann: “Analyze this! 145 questions for data scientists in software engineering”. In Proc. 36th International Conference on Software Engineering - ICSE‘14, 2014, doi:10.1145/2568225.2568233.
Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu: “Don’t touch my code!”. In Proc. 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering - SIGSOFT/FSE‘11, 2011, doi:10.1145/2025113.2025119.
Allen C. Bluedorn, Daniel B. Turban, and Mary Sue Love: “The effects of stand-up and sit-down meeting formats on meeting outcomes.”. Journal of Applied Psychology, 84(2), 1999, doi:10.1037/0021-9010.84.2.277.
Sebastian G. Elbaum and John C. Munson: “Code churn: A measure for estimating the impact of code change”. In Proc. International Conference on Software Maintenance, 1998.
Denae Ford, Justin Smith, Philip J. Guo, and Chris Parnin: “Paradise unplugged: identifying barriers for female participation on stack overflow”. In Proc. 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2016, 2016, doi:10.1145/2950290.2950331.
Davide Fucci, Giuseppe Scanniello, Simone Romano, Martin Shepperd, Boyce Sigweni, Fernando Uyaguari, Burak Turhan, Natalia Juristo, and Markku Oivo: “An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach”. In Proc. 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM‘16, 2016, doi:10.1145/2961111.2962592.
Daniel Graziotin, Xiaofeng Wang, and Pekka Abrahamsson: “Happy software developers solve problems better: psychological measurements in empirical software engineering”. PeerJ, 2, Mar 2014, doi:10.7717/peerj.289.
Stefan Hanenberg: “An experiment about static and dynamic type systems”. ACM SIGPLAN Notices, 45(10), Oct 2010, doi:10.1145/1932682.1869462.
J.E. Hannay, E. Arisholm, H. Engvik, and D.I.K. Sjoberg: “Effects of Personality on Pair Programming”. IEEE Transactions on Software Engineering, 36(1), Jan 2010, doi:10.1109/tse.2009.41.
Magne Jorgensen and Stein Grimstad: “The Impact of Irrelevant and Misleading Information on Software Development Effort Estimates: A Randomized Controlled Field Experiment”. IEEE Transactions on Software Engineering, 37(5), Sep 2011, doi:10.1109/tse.2010.78.
Foutse Khomh, Tejinder Dhaliwal, Ying Zou, and Bram Adams: “Do faster releases improve software quality? An empirical case study of Mozilla Firefox”. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), Jun 2012, doi:10.1109/msr.2012.6224279.
Shane McIntosh, Bram Adams, Thanh H.D. Nguyen, Yasutaka Kamei, and Ahmed E. Hassan: “An empirical study of build maintenance effort”. In Proceeding of the 33rd international conference on Software engineering - ICSE‘11, 2011, doi:10.1145/1985793.1985813.
Andrew Meneely, Pete Rotella, and Laurie Williams: “Does adding manpower also affect quality?”. In Proc. 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering - SIGSOFT/FSE‘11, 2011, doi:10.1145/2025113.2025128.
André N. Meyer, Thomas Fritz, Gail C. Murphy, and Thomas Zimmermann: “Software developers’ perceptions of productivity”. In Proc. 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014, 2014, doi:10.1145/2635868.2635892.
Suman Nakshatri, Maithri Hegde, and Sahithi Thandra: “Analysis of exception handling patterns in Java projects”. In Proc. 13th International Workshop on Mining Software Repositories - MSR‘16, 2016, doi:10.1145/2901739.2903499.
Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker: “Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software”. In Proc. 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015, doi:10.1145/2786805.2786852.
Andreas Zeller, Thomas Zimmermann, and Christian Bird: “Failure is a four-letter word”. In Proc. 7th International Conference on Predictive Models in Software Engineering - Promise‘11, 2011, doi:10.1145/2020390.2020395.

Categories: technical-writing