Terms
Back in 1975, Fred Brooks wrote:
Show me your flowcharts and conceal your tables, and I shall continue to be mystified; show me your tables and I won’t usually need your flowcharts: they’ll be obvious.
Along the same lines, telling me the terms that someone needs to know in order to understand something is a quick and dirty way to figure out what a lesson about that thing needs to cover. I have therefore gone through two dozen empirical studies on software engineering and pulled out the terms they use that computer science undergraduates are unlikely to know. It’s an intimidating list, but if we want to teach software engineers how to apply data science to software engineering problems and understand empirical software engineering research, I think we’ll have to cover most of it.
See below the table for the papers these terms were found in.
accuracy alternative hypothesis Amdahl's Law analysis of variance Bayes' Rule Benjamini-Hochberg p-value correction Bernoulli distribution Bessel correction binomial distribution Bonferroni correction box-and-whisker plot central moment Chebyshev's Inequality chi-square test Cliff's δ Cohen's d Cohen's kappa conditional probability confidence interval continuity correction convergence correlation coefficient covariance covariance matrix cumulative distribution function dataframe degrees of freedom dependent variable descriptive statistics effect size expected value explanatory variable F-measure F-test false negative false positive Gamma distribution Gamma function geometric distribution goal-question-metric Greenhouse-Geisser correction harmonic mean histogram independent variable interquartile range Kano scale Kruskal-Wallis test Likert scale linear regression logistic regression long tail Mann-Whitney U test Mauchly's test for sphericity maximum likelihood estimation |
mean median method of moments multiple linear regression n-gram analysis negative binomial distribution negative binomial regression Noble's Rules Not a Number normal distribution nuisance factor null hypothesis one-sided distribution outlier overdispersion quartile p hacking p value Poisson distribution pooled sample variance population population moment power law distribution precision principal component analysis probability density function probability mass function quartile rank correlation recall response variable sample sample moment sample variance Shapiro-Wilk test sigmoidal curve Spearman's rank correlation standard deviation standard normal distribution standard uniform distribution statistic statistical model t-distribution t-test tidy data uniform distribution variance variance violin plot Wilcoxon rank-sum test Wilcoxon signed rank test z-test Zipf's Law Zipf-Mandelbrot distribution |
The papers are:
-
Laurence Aitchison, Nicola Corradi, and Peter E. Latham: “Zipf’s Law Arises Naturally When There Are Underlying, Unobserved Variables”. PLOS Computational Biology, 12(12), Dec 2016, doi:10.1371/journal.pcbi.1005110.
-
Amjad Altadmri and Neil C.C. Brown: “37 Million Compilations”. In Proc. 46th ACM Technical Symposium on Computer Science Education - SIGCSE’15, 2015, doi:10.1145/2676723.2677258. Summarizes the authors’ analysis of novice programming mistakes.
-
B.C.D. Anda, D.I.K. Sjoberg, and A. Mockus: “Variability and Reproducibility in Software Engineering: A Study of Four Companies that Developed the Same System”. IEEE Transactions on Software Engineering, 35(3), May 2009, doi:10.1109/tse.2008.89.
-
Andrew Begel and Thomas Zimmermann: “Analyze this! 145 questions for data scientists in software engineering”. In Proc. 36th International Conference on Software Engineering - ICSE’14, 2014, doi:10.1145/2568225.2568233.
-
Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu: “Don’t touch my code!”. In Proc. 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering - SIGSOFT/FSE’11, 2011, doi:10.1145/2025113.2025119.
-
Allen C. Bluedorn, Daniel B. Turban, and Mary Sue Love: “The effects of stand-up and sit-down meeting formats on meeting outcomes.”. Journal of Applied Psychology, 84(2), 1999, doi:10.1037/0021-9010.84.2.277.
-
Sebastian G. Elbaum and John C. Munson: “Code churn: A measure for estimating the impact of code change”. In Proc. International Conference on Software Maintenance, 1998.
-
Denae Ford, Justin Smith, Philip J. Guo, and Chris Parnin: “Paradise unplugged: identifying barriers for female participation on stack overflow”. In Proc. 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2016, 2016, doi:10.1145/2950290.2950331.
-
Davide Fucci, Giuseppe Scanniello, Simone Romano, Martin Shepperd, Boyce Sigweni, Fernando Uyaguari, Burak Turhan, Natalia Juristo, and Markku Oivo: “An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach”. In Proc. 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM’16, 2016, doi:10.1145/2961111.2962592.
-
Daniel Graziotin, Xiaofeng Wang, and Pekka Abrahamsson: “Happy software developers solve problems better: psychological measurements in empirical software engineering”. PeerJ, 2, Mar 2014, doi:10.7717/peerj.289.
-
Stefan Hanenberg: “An experiment about static and dynamic type systems”. ACM SIGPLAN Notices, 45(10), Oct 2010, doi:10.1145/1932682.1869462.
-
J.E. Hannay, E. Arisholm, H. Engvik, and D.I.K. Sjoberg: “Effects of Personality on Pair Programming”. IEEE Transactions on Software Engineering, 36(1), Jan 2010, doi:10.1109/tse.2009.41.
-
Magne Jorgensen and Stein Grimstad: “The Impact of Irrelevant and Misleading Information on Software Development Effort Estimates: A Randomized Controlled Field Experiment”. IEEE Transactions on Software Engineering, 37(5), Sep 2011, doi:10.1109/tse.2010.78.
-
Foutse Khomh, Tejinder Dhaliwal, Ying Zou, and Bram Adams: “Do faster releases improve software quality? An empirical case study of Mozilla Firefox”. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), Jun 2012, doi:10.1109/msr.2012.6224279.
-
Shane McIntosh, Bram Adams, Thanh H.D. Nguyen, Yasutaka Kamei, and Ahmed E. Hassan: “An empirical study of build maintenance effort”. In Proceeding of the 33rd international conference on Software engineering - ICSE’11, 2011, doi:10.1145/1985793.1985813.
-
Andrew Meneely, Pete Rotella, and Laurie Williams: “Does adding manpower also affect quality?”. In Proc. 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering - SIGSOFT/FSE’11, 2011, doi:10.1145/2025113.2025128.
-
André N. Meyer, Thomas Fritz, Gail C. Murphy, and Thomas Zimmermann: “Software developers’ perceptions of productivity”. In Proc. 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014, 2014, doi:10.1145/2635868.2635892.
-
Suman Nakshatri, Maithri Hegde, and Sahithi Thandra: “Analysis of exception handling patterns in Java projects”. In Proc. 13th International Workshop on Mining Software Repositories - MSR’16, 2016, doi:10.1145/2901739.2903499.
-
Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker: “Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software”. In Proc. 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015, 2015, doi:10.1145/2786805.2786852.
-
Andreas Zeller, Thomas Zimmermann, and Christian Bird: “Failure is a four-letter word”. In Proc. 7th International Conference on Predictive Models in Software Engineering - Promise’11, 2011, doi:10.1145/2020390.2020395.