Bibliography

Citations used in the main body of this material have short descriptions here. For descriptions of the other citations, see Further Reading; for the papers themselves, please see the pdf directory in the project repository.

A

Abad2018
Zahra Shakeri Hossein Abad, Oliver Karras, Kurt Schneider, Ken Barker, and Mike Bauer: "Task Interruption in Software Development Projects: What Makes some Interruptions More Disruptive than Others?" arXiv 1805.05508, 2018, 10.48550/arXiv.1805.05508.
AlencarDaCosta2017
Daniel Alencar da Costa, Shane McIntosh, Weiyi Shang, Uira Kulesza, Roberta Coelho, and Ahmed E. Hassan: "A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes." IEEE Trans. Software Engineering, 43(7), 641-657, 2017, 10.1109/tse.2016.2616306.
Anda2009
B.C.D. Anda, D.I.K. Sjoberg, and A. Mockus: "Variability and Reproducibility in Software Engineering: A Study of Four Companies that Developed the Same System." IEEE Trans. Software Engineering, 35(3), 407-429, 2009, 10.1109/tse.2008.89.
Aniche2021
Mauricio Aniche, Christoph Treude, and Andy Zaidman: "How Developers Engineer Test Cases: An Observational Study." IEEE Trans. Software Engineering, 2021, 10.1109/tse.2021.3129889.
Aranda2009
Jorge Aranda and Gina Venolia: "The secret life of bugs: Going past the errors and omissions in software repositories." Proc. ICSE'09, 2009, 10.1109/icse.2009.5070530.

B

Baltes2025
Sebastian Baltes, Florian Angermeir, Chetan Arora, Marvin Muñoz Barón, Chunyang Chen, Lukas Böhme, Fabio Calefato, Neil Ernst, Davide Falessi, Brian Fitzgerald, Davide Fucci, Marcos Kalinowski, Stefano Lambiase, Daniel Russo, Mircea Lungu, Lutz Prechelt, Paul Ralph, Christoph Treude, and Stefan Wagner: "Evaluation Guidelines for Empirical Studies in Software Engineering involving LLMs." arXiv 2508.15503, 2025, 10.48550/arXiv.2508.15503.
Bano2025
Muneera Bano, Hashini Gunatilake, and Rashina Hoda: "What Does a Software Engineer Look Like? Exploring Societal Stereotypes in LLMs." arXiv 2501.03569, 2025, 10.48550/arXiv.2501.03569.
Basili1987
V.R. Basili and R.W. Selby: "Comparing the Effectiveness of Software Testing Strategies." IEEE Trans. Software Engineering, SE-13(12), 1278-1296, 1987, 10.1109/tse.1987.232881.
Basili1994
Victor R. Basili, Gianluigi Caldiera, and H. Dieter Rombach: "The Goal Question Metric Approach." In John Marciniak (ed.), Encyclopedia of Software Engineering, Wiley, 1994.
Introduces the Goal/Question/Metric (GQM) framework for systematically defining software measurements by linking metrics to explicit goals and intermediate questions.
Bauer2019
Jennifer Bauer, Janet Siegmund, Norman Peitek, Johannes C. Hofmeister, and Sven Apel: "Indentation: Simply a Matter of Style or Support for Program Comprehension?." Proc. ICPC'19, 2019, 10.1109/icpc.2019.00033.
Beck2023
Kent Beck: "Measuring Developer Productivity: Real-World Examples." Medium, 2023. https://tidyfirst.substack.com/p/measuring-developer-productivity
Begel2014
Andrew Begel and Nachiappan Nagappan: "Analyze This! 145 Questions for Data Scientists in Software Engineering." Proc. ICSE'14, 2014, 10.1145/2568225.2568233.
Two surveys producing 145 data science questions for SE research in 12 categories; engineers prioritize customer usage questions; oppose questions assessing or comparing individual employee performance.
Behroozi2020
Mahnaz Behroozi, Shivani Shirolkar, Titus Barik, and Chris Parnin: "Does Stress Impact Technical Interview Performance?" Proc. ESEC/FSE'20, 481-492, 2020, 10.1145/3368089.3409712.
Beller2015
Moritz Beller, Georgios Gousios, Annibale Panichella, and Andy Zaidman: "When, how, and why developers (do not) test in their IDEs." Proc. FSE'15, 2015, 10.1145/2786805.2786843.
Large-scale field study of 416 software engineers over 5 months (13+ years of IDE activity); majority do not test; TDD not widely practiced; developers spend 25% of time on tests but believe they spend 50%.
Beller2018
Moritz Beller, Niels Spruit, Diomidis Spinellis, and Andy Zaidman: "On the Dichotomy of Debugging Behavior Among Programmers." Proc. ICSE'18, 2018, 10.1145/3180155.3180175.
Beller2019
Moritz Beller, Georgios Gousios, Annibale Panichella, Sebastian Proksch, Sven Amann, and Andy Zaidman: "Developer Testing in the IDE: Patterns, Beliefs, and Behavior." IEEE Trans. Software Engineering, 45(3), 261-284, 2019, 10.1109/tse.2017.2776152.
Bettenburg2008
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann: "What makes a good bug report?" Proc. SIGSOFT/FSE'08, 2008, 10.1145/1453101.1453146.
Reports a survey of 466 Apache, Eclipse, and Mozilla developers identified which elements of bug reports practitioners find most useful. Stack traces, test cases, and steps to reproduce ranked highest, while information that reporters consider important was often missing from submitted reports.
Bird2011
Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu: "Don't Touch My Code! Examining the Effects of Ownership on Software Quality." Proc. SIGSOFT/FSE'11, 2011, 10.1145/2025113.2025119.
Bogart2016
Christopher Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung: "How to break an API: cost negotiation and community values in three software ecosystems." Proc. FSE'16, 2016, 10.1145/2950290.2950325.
Bouzenia2025
Islem Bouzenia and Michael Pradel: "Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories." Proc. ASE'25, 2025, 10.1109/ASE63991.2025.002344.
Brandt2009
Joel Brandt, Philip J. Guo, Joel Lewenstein, Mira Dontcheva, and Scott R. Klemmer: "Two studies of opportunistic programming: interleaving web foraging, learning, and writing code." Proc. CHI'09, 2009, 10.1145/1518701.1518944.
Two studies (lab + query log) of programmers using web resources; identifies three purposes: just-in-time learning, knowledge extension, and reminding; queries for different purposes differ in style and duration.
Braun2019
Virginia Braun and Victoria Clarke: "Reflecting on Reflexive Thematic Analysis." Qualitative Research in Sport, Exercise and Health, 11(4), 2019, 10.1080/2159676X.2019.1628806.
Clarifies reflexive thematic analysis as a distinct qualitative methodology, contrasting it with other forms of thematic analysis.
Brown2024
Eva Maxfield Brown, Cailean Osborne, Peter Cihon, Moritz Böhmecke-Schwafert, Kevin Xu, Mirko Boehm, and Knut Blind: "Measuring Software Innovation with Open Source Software Development Data." arXiv 2411.05087, 2024, 10.48550/arXiv.2411.05087.
Butler2023
Jenna Butler, Thomas Zimmermann, and Christian Bird: "Objectives and Key Results in Software Teams: Challenges, Opportunities and Impact on Development." arXiv 2311.00236, 2023, 10.48550/arXiv.2311.00236.

C

Campbell1963
Donald T. Campbell and Julian C. Stanley: Experimental and Quasi-Experimental Designs for Research. Houghton Mifflin, 1963.
Classic research design textbook establishing the concepts of internal and external validity and the taxonomy of experimental and quasi-experimental designs.
Cosentino2016
Valerio Cosentino, Javier Luis, and Jordi Cabot: "Findings from GitHub." Proc. MSR'16, 2016, 10.1145/2901739.2901776.

D

Davis2023
Matthew C. Davis, Emad Aghayi, Thomas D. Latoza, Xiaoyin Wang, Brad A. Myers, and Joshua Sunshine: "What's (Not) Working in Programmer User Studies?." ACM Trans. Software Engineering and Methodology, 32(5), 1-32, 2023, 10.1145/3587157.
Devanbu2016
Prem Devanbu, Thomas Zimmermann, and Christian Bird: "Belief & evidence in empirical software engineering." Proc. ICSE'16, 2016, 10.1145/2884781.2884812.
Case study of developer beliefs at Microsoft vs. empirical project data; beliefs are strong but formed from personal experience rather than research; do not reliably correspond to actual evidence; recommends better dissemination of empirical findings.
Diener2010
Ed Diener, Derrick Wirtz, William Tov, Chu Kim-Prieto, Dong-won Choi, Shigehiro Oishi, and Robert Biswas-Diener: "New well-being measures: Short scales to assess flourishing and positive and negative feelings." Social Indicators Research, 97, 2010, 10.1007/s11205-009-9493-y.
Presents validated short scales for measuring human flourishing and positive/negative affect as components of subjective well-being.

E

ElEmam2001
K. El Emam, S. Benlarbi, N. Goel, and S.N. Rai: "The confounding effect of class size on the validity of object-oriented metrics." IEEE Trans. Software Engineering, 27(7), 2001, 10.1109/32.935855.
ElHaji2024
Khalid El Haji, Carolin Brandt, and Andy Zaidman: "Using GitHub Copilot for Test Generation in Python: An Empirical Study." Proc. AST'24, 2024, 10.1145/3644032.3644443.
Erdogmus2005
Hakan Erdogmus, Maurizio Morisio, and Marco Torchiano: "On the Effectiveness of the Test-First Approach to Programming." IEEE Trans. Software Engineering, 31(3), 2005, 10.1109/tse.2005.37.
Controlled experiment finding that TDD does not inherently improve code quality, but that test quantity regardless of writing order was the key driver of programmer productivity.

F

FernandezPinto2023
Manuela Fernández Pinto and Daniel Fernández Pinto: "Epistemic diversity and industrial selection bias." Synthese, 201(5), 2023, 10.1007/s11229-023-04158-7.
Flournoy2025
John C. Flournoy, Carol S. Lee, Maggie Wu, and Catherine M. Hicks: "No Silver Bullets: Why Understanding Software Cycle Time is Messy, Not Magic." arXiv 2503.05040, 2025, 10.48550/arXiv.2503.05040.
Forsgren2018
Nicole Forsgren, Jez Humble, and Gene Kim: Accelerate: The Science of Lean Software and DevOps. IT Revolution Press, 2018.
Presents evidence that elite DevOps organizations achieve both high speed and stability, and identifies CI/CD, lean management, and learning culture as the key predictors of performance.
Forsgren2021
Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler: "The SPACE of Developer Productivity." ACM Queue, 19(1), 2021.
Brief commentary noting developer productivity is more complex than commonly assumed.
Fu2025
Yujia Fu, Peng Liang, Amjed Tahir, Zengyang Li, Mojtaba Shahin, Jiaxin Yu, and Jinfu Chen: "Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study." ACM Trans. Software Engineering and Methodology, 34(8), 2025, 10.1145/3716848.
Fucci2013
Davide Fucci, Burak Turhan, and Markku Oivo: "Impact of Process Conformance on the Effects of Test-Driven Development." Proc. ESEM'13, 2013, 10.1109/esem.2013.19.
Fucci2016
Davide Fucci, Giuseppe Scanniello, Simone Romano, Martin Shepperd, Boyce Sigweni, Fernando Uyaguari, Burak Turhan, Natalia Juristo, and Markku Oivo: "An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach." Proc. ESEM'16, 2016, 10.1145/2961111.2962592.
Fucci2017
Davide Fucci, Hakan Erdogmus, Burak Turhan, Markku Oivo, and Natalia Juristo: "A Dissection of the Test-Driven Development Process: Does It Really Matter to Test-First or to Test-Last?." IEEE Trans. Software Engineering, 43(7), 2017, 10.1109/tse.2016.2616877.
Fucci2018
Davide Fucci, Giuseppe Scanniello, Simone Romano, and Natalia Juristo: "Need for Sleep: The Impact of a Night of Sleep Deprivation on Novice Developers' Performance." arXiv 1805.02544, 2018, 10.48550/arXiv.1805.02544.

G

Girardi2020
Daniela Girardi, Nicole Novielli, Davide Fucci, and Filippo Lanubile: "Recognizing Developers' Emotions While Programming." Proc. ICSE'20, 2020, 10.1145/3377811.3380374.
Gold2020
Nicolas E. Gold and Jens Krinke: "Ethical Mining." Proc. MSR'20, 2020, 10.1145/3379597.3387462.
Goodhart1984
Charles Goodhart: "Problems of Monetary Management: The U.K. Experience." In Anthony Courakis (ed.), Inflation, Depression, and Economic Policy in the West, Rowman and Littlefield, 1984.
Articulates what became Goodhart's Law: when a measure becomes a policy target it ceases to be a good measure, because agents optimize for the metric rather than the underlying goal it represents.
Gote2022
Christoph Gote, Pavlin Mavrodiev, Frank Schweitzer, and Ingo Scholtes: "Big Data = Big Insights? Operationalising Brooks' Law in a Massive GitHub Data Set." arXiv 2201.04588, 2022, 10.48550/arXiv.2201.04588.
Graziotin2018
Daniel Graziotin, Fabian Fagerholm, Xiaofeng Wang, and Pekka Abrahamsson: "What Happens When Software Developers Are (Un)Happy." Journal of Systems and Software, 140, 2018, 10.1016/j.jss.2018.02.041
Mixed-methods study finding that unhappy developers are less productive, produce lower-quality work, and have higher intent to leave their jobs.

H

Hall2019
Erika Hall: Just Enough Research. A Book Apart, 2nd ed., 2019, 9781952616082.
Practical guide to user research for designers and product teams, arguing that the goal of research is to reduce uncertainty enough to act wisely, and that small, focused studies are almost always more useful than no research at all.
Harman2001
Mark Harman and Bryan F. Jones: "Search-Based Software Engineering." Information and Software Technology, 43(14), 2001.
Introduces search-based software engineering, proposing metaheuristic search techniques as a general framework for automating software engineering tasks as optimization problems.
Hindle2016
Abram Hindle, Earl T. Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu: "On the naturalness of software." Comm. ACM, 59(5), 2016, 10.1145/2902362.
N-gram models show code is more repetitive and predictable than natural language; validates naturalness hypothesis; demonstrates improved Java code completion in Eclipse using statistical language models.

I

Inozemtseva2014
Laura Inozemtseva and Reid Holmes: "Coverage is Not Strongly Correlated with Test Suite Effectiveness." Proc. ICSE'14, 2014, 10.1145/2568225.2568271.

J

Johnson2013
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge: "Why don't software developers use static analysis tools to find bugs?" Proc. ICSE'13, 2013, 10.1109/icse.2013.6606613.
Interview study with 20 developers on why static analysis tools are underused; all felt use is beneficial but false positives and unhelpful warning presentation are the main barriers; recommends interactive defect-fixing mechanisms.
Junior2009
Gibeon Soares de Aquino Junior and Silvio Romero de Lemos Meira: "Towards Effective Productivity Measurement in Software Projects." Proc. SEA'09, 2009, 10.1109/icsea.2009.44.
Juristo2001
Natalia Juristo and Ana M. Moreno: Basics of Software Engineering Experimentation. Springer, 2001, 9780792379904.
Textbook introducing the principles and techniques of controlled experimentation in software engineering, covering design, analysis, and validity evaluation.

K

Kalliamvakou2014
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian: "The Promises and Perils of Mining GitHub." Proc. MSR'14, 2014, 10.1145/2597073.2597074.
Empirical analysis of GitHub data revealing systematic biases including that most projects are personal and inactive, and that pull-request data routinely misrepresents actual collaboration.
Kamei2013
Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi: "A large-scale empirical study of just-in-time quality assurance." IEEE Trans. Software Engineering, 39(6), 2013, 10.1109/tse.2012.70.
Kampenes2007
Vigdis By Kampenes, Tore Dybå, Jo Erskine Hannay, and Dag I.K. Sjøberg: "A Systematic Review of Effect Size in Software Engineering Experiments." Information and Software Technology, 49(11-12), 2007.
Systematic review finding that effect sizes are rarely reported in SE experiments and when reported are mostly small, suggesting many statistically significant SE results may not be practically meaningful.
Ko2007
Amy J. Ko, Robert DeLine, and Gina Venolia: "Information Needs in Collocated Software Development Teams." Proc. ICSE'07, 2007, 10.1109/icse.2007.45.
Observation study of 17 developers at a large software company; identifies 21 information types sought during change tasks; most frequently deferred: design rationale and program behavior; unavailable coworkers most common blocker.

L

Liang2024
Jenny T. Liang, Chenyang Yang, and Brad A. Myers: "A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges." Proc. ICSE'24, 2024, 10.1145/3597503.3608128.
Survey of 410 developers finding that AI coding assistants are valued for reducing keystrokes but that trust, correctness verification, and context-awareness remain significant usability challenges.

M

Maalej2014
Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke: "On the Comprehension of Program Comprehension." ACM Trans. Software Engineering and Methodology, 23(4), 2014, 10.1145/2622669.
Mark2008
Gloria Mark, Daniela Gudith, and Ulrich Klocke: "The Cost of Interrupted Work: More Speed and Stress." Proc. CHI'08, 2008.
Controlled study finding that interrupted workers compensate by working faster to complete tasks in equivalent time, but do so at the cost of significantly higher stress and frustration.
McKinsey2023
Nora Elsayed, Tarek Elhounsri, and Sven Blumberg: "Yes, You Can Measure Software Developer Productivity." McKinsey & Company, 2023, https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/yes-you-can-measure-software-developer-productivity.
An embarrassingly bad collection of muddled claims about measuring developer productivity. Somebody probably got promoted for writing it.
Medlock2002
Michael C. Medlock, Dennis Wixon, Mark Terrano, Ramon Ruflair, and Darrell Vaughan: "Using the RITE Method to Improve Products: A Definition and a Case Study." Proc. Usability Professionals Association Conference, 2002.
Introduces the Rapid Iterative Testing and Evaluation (RITE) method, in which usability problems identified during a session are addressed before the next session, enabling rapid iteration with small participant pools.
Meyer2017
André N. Meyer, Laura E. Barton, Gail C. Murphy, Thomas Zimmermann, and Thomas Fritz: "The Work Life of Developers: Activities, Switches and Perceived Productivity." IEEE Trans. Software Engineering, 43(12), 2017, 10.1109/tse.2017.2656886.
Monitoring 20 developers over 11 work-days shows more user input correlates with higher perceived productivity; emails and planned meetings correlate negatively; productivity is highly personal and varies by time of day.
Meyer2021
André N. Meyer, Earl T. Barr, Christian Bird, and Thomas Zimmermann: "Today Was a Good Day: The Daily Life of Software Developers." IEEE Trans. Software Engineering, 47(5), 2021, 10.1109/tse.2019.2904957.
Miller2025
Courtney Miller, Rudrajit Choudhuri, Mara Ulloa, Sankeerti Haniyur, Robert DeLine, Margaret-Anne Storey, Emerson Murphy-Hill, Christian Bird, and Jenna L. Butler: ""Maybe We Need Some More Examples:" Individual and Team Drivers of Developer GenAI Tool Use." arXiv 2507.21280, 2025, 10.48550/arXiv.2507.21280.
Mockus2010
Audris Mockus: "Organizational Volatility and Its Effects on Software Defects." Proc. SIGSOFT/FSE'10, 2010, 10.1145/1882291.1882311.
Muller2015
Sebastian C. Muller and Thomas Fritz: "Stuck and Frustrated or in Flow and Happy: Sensing Developers' Emotions and Progress." Proc. ICSE'15, 2015, 10.1109/icse.2015.334.
Lab study (n=17) of developer emotions and biometric sensors during change tasks; emotions correlate with perceived progress; classifier achieves 71\% accuracy for positive/negative emotion and 68\% for low/high progress.
Munaiah2017
Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan: "Curating GitHub for engineered software projects." Empirical Software Engineering, 22(6), 2017, 10.1007/s10664-017-9512-6.
Framework classifying 1.8M+ GitHub repos as engineered software vs. noise; best classifier achieves 82\% precision / 86\% recall; outperforms stargazer-based approaches which have high precision but low recall.

N

Nagappan2008
Nachiappan Nagappan, Brendan Murphy, and Victor Basili: "The Influence of Organizational Structure on Software Quality: An Empirical Case Study." Proc. ICSE'08, 2008, 10.1145/1368088.1368160.
Newman2023
Kaia Newman, Madeline Endres, Brittany Johnson, and Westley Weimer: "From Organizations to Individuals: Psychoactive Substance Use By Professional Programmers." arXiv 2305.01056, 2023, 10.48550/arXiv.2305.01056.
Nielsen1993
Jakob Nielsen and Thomas K. Landauer: "A Mathematical Model of the Finding of Usability Problems." Proc. INTERACT'93 and CHI'93, 206-213, 1993, 10.1145/169059.169166.
Develops a mathematical model showing that approximately five participants are sufficient to identify most major usability problems in a focused task set, under the assumption of formative testing with a reasonably homogeneous user group.

O

Obi2024
Ike Obi, Jenna Butler, Sankeerti Haniyur, Brian Hassan, Margaret-Anne Storey, and Brendan Murphy: "Identifying Factors Contributing to Bad Days for Software Developers: A Mixed Methods Study." arXiv 2410.18379, 2024, 10.48550/arXiv.2410.18379.

P

Pearce2022
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri: "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions." Proc. S&P'22, 2022, 10.1109/SP46214.2022.9833571.
Found that approximately 40% of code generated by GitHub Copilot across 89 security-relevant scenarios contained vulnerabilities drawn from the MITRE CWE Top 25 list.
Peng2023
Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer: "The Impact of AI on Developer Productivity: Evidence from GitHub Copilot." arXiv 2302.06590, 2023, 10.48550/arXiv.2302.06590.
Randomized controlled experiment claiming that GitHub Copilot users completed a JavaScript coding task 55.8% faster than the control group.
Prechelt2000
Lutz Prechelt: "An Empirical Comparison of Seven Programming Languages." IEEE Computer, 33(10), 2000, 10.1109/2.876288.
Compares 80 implementations of a phone-code program in C, C++, Java, Perl, Python, Rexx, Tcl; scripting languages require less code and effort but are slower; significant variation within each language.

Q

R

Ray2017
Baishakhi Ray, Daryl Posnett, Premkumar Devanbu, and Vladimir Filkov: "A large-scale study of programming languages and code quality in GitHub." Comm. ACM, 60(10), 2017, 10.1145/3126905.
Risse2025
Niklas Risse, Jing Liu, and Marcel Böhme: "Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection." Proc. ACM Software Engineering, 2025, 10.1145/3728887.
Survey of ML vulnerability detection literature; 90\% frame it as function-level binary classification; context is almost always necessary for accurate judgement; high scores achievable via spurious correlations; calls the prevailing problem statement ill-defined.

S

Sackman1968
Harold Sackman, W.J. Erikson, and E.E. Grant: "Exploratory Experimental Studies Comparing Online and Offline Programming Performance." Comm. ACM, 11(1), 1968.
Early empirical study finding up to 28:1 variation in individual programmer performance, which dwarfed any treatment effect and is often cited as the origin of the "10x programmer" concept.
Sadowski2019
Caitlin Sadowski and Thomas Zimmermann (eds.): Rethinking Productivity in Software Engineering. Apress, 2019, 9781484242216.
Edited volume collecting research and practitioner perspectives on how to understand, define, and measure software developer productivity.
SanchezRuiz2023
José Manuel Sánchez Ruiz, Francisco José Domínguez Mayo, Xavier Oriol, José Francisco Crespo, David Benavides, and Ernest Teniente: "A Benchmarking Proposal for DevOps Practices on Open Source Software Projects." arXiv 2304.14790, 2023, 10.48550/arXiv.2304.14790.
Sedano2017
Todd Sedano, Paul Ralph, and Cécile Péraire: "Software Development Waste." Proc. ICSE'17, 2017, 10.1109/icse.2017.20.
Two-year participant-observation study at Pivotal identifies 9 types of software development waste: wrong features, backlog mismanagement, rework, unnecessary complexity, cognitive load, psychological distress, waiting, knowledge loss, and poor communication.
Sillito2008
J. Sillito, G.C. Murphy, and K. De Volder: "Asking and Answering Questions during a Programming Change Task." IEEE Trans. Software Engineering, 34(4), 2008, 10.1109/tse.2008.26.
Two qualitative studies of programmers during change tasks; produces catalog of 44 question types; describes information-seeking behavior and how well existing tools support answering these questions.
Silva2016
Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente: "Why We Refactor? Confessions of GitHub Contributors." Proc. FSE'16, 2016, 10.1145/2950290.2950305.
Spinellis2024
Diomidis Spinellis, Panos Louridas, Maria Kechagia, and Tushar Sharma: "Broken Windows: Exploring the Applicability of a Controversial Theory on Code Quality." arXiv 2410.13480, 2024, 10.48550/arXiv.2410.13480.
Stapleton2020
Sean Stapleton, Yashmeet Gambhir, Alexander LeClair, Zachary Eberhart, Westley Weimer, Kevin Leach, and Yu Huang: "A Human Study of Comprehension and Code Summarization." Proc. ICPC'20, 2020, 10.1145/3387904.3389258.
Storey2022
Margaret-Anne Storey, Brian Houck, and Thomas Zimmermann: "How Developers and Managers Define and Trade Productivity for Quality." Proc. CHASE'22, 2022, 10.1145/3528579.3529177.
Storey2024
Margaret-Anne Storey, Rashina Hoda, Alessandra Maciel Paz Milani, and Maria Teresa Baldassarre: "Guidelines for Using Mixed Methods Research in Software Engineering." arXiv 2404.06011, 2024, 10.48550/arXiv.2404.06011.

T

Thongtanunam2016
Patanamon Thongtanunam, Shane McIntosh, Ahmed E. Hassan, and Hajimu Iida: "Revisiting Code Ownership and Its Relationship with Software Quality in the Scope of Modern Code Review." Proc. ICSE'16, 2016, 10.1145/2884781.2884852.
Thornberg2014
Robert Thornberg and Kathy Charmaz: "Grounded Theory and Theoretical Coding." In Uwe Flick (ed.), The SAGE Handbook of Qualitative Data Analysis, SAGE, 2014, 10.4135/9781446282243.
Explains grounded theory and theoretical coding as qualitative analysis tools, emphasizing systematic yet flexible concept development from data.
Tregubov2017
Alexey Tregubov, Barry Boehm, Natalia Rodchenko, and Jo Ann Lane: "Impact of task switching and work interruptions on software development processes." Proc. ICSSP'17, 2017, 10.1145/3084100.3084116.
Treude2024
Christoph Treude: "Qualitative Data Analysis in Software Engineering: Techniques and Teaching Insights." arXiv 2406.08228, 2024, 10.48550/arXiv.2406.08228.
Tufano2017
Michele Tufano, Fabio Palomba, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Andrea De Lucia, and Denys Poshyvanyk: "When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away)." IEEE Trans. Software Engineering, 43(11), 2017, 10.1109/tse.2017.2653105.
Large empirical study of 200 OSS project histories; most code smells are introduced when artifacts are created, not during evolution; 80\% survive; only 9\% of removed smells are directly caused by refactoring operations.

U

Uyaguari2024
Fernando Uyaguari, Silvia T. Acuña, John W. Castro, Davide Fucci, Oscar Dieste, and Sira Vegas: "Relevant information in TDD experiment reporting." ACM Trans. Software Engineering and Methodology, 2024, 10.1145/3688837.

V

Vartziotis2025
Tina Vartziotis, Maximilian Schmidt, George Dasoulas, Ippolyti Dellatolas, Stefano Attademo, Viet Dung Le, Anke Wiechmann, Tim Hoffmann, Michael Keckeisen, and Sotirios Kotsopoulos: "Carbon Footprint Evaluation of Code Generation through LLM as a Service." arXiv 2504.01036, 2025, 10.48550/arXiv.2504.01036.

W

Wessel2021
Mairieli Wessel, Igor Wiese, Igor Steinmacher, and Marco Aurelio Gerosa: "Don't Disturb Me: Challenges of Interacting with Software Bots on Open Source Software Projects." Proc. ACM Human-Computer Interaction, 2021, 10.1145/3476042.
Interview study of 21 OSS practitioners on bots in pull requests; identifies noise (overwhelming and distracting bot output) as central problem; develops theory of annoying bot behavior as noise; recommendations for bot and platform designers.
Wicherts2011
J.M. Wicherts, M. Bakker, and D. Molenaar: "Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results". PLoS ONE, 6(11): e26828, 2011, 10.1371/journal.pone.0026828.
Found that the reluctance to share data was associated with weaker evidence and a higher prevalence of statistical errors. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance.
Wohlin2000
Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén: Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers, 2000, 9783662693056.
Introduces principles and methods for conducting controlled experiments in software engineering, covering design, execution, analysis, and identification of validity threats.
Wyrich2023
Marvin Wyrich: "Source Code Comprehension: A Contemporary Definition and Conceptual Model for Empirical Investigation." arXiv 2310.11301, 2023, 10.48550/arXiv.2310.11301.

X

Y

Z

Zieris2014
Franz Zieris and Lutz Prechelt: "On knowledge transfer skill in pair programming." Proc. ESEM'14, 2014, 10.1145/2652524.2652529.
Qualitative analysis of industrial pair programming recordings; efficient pairs avoid explaining multiple things at once, maintain topic focus, and clarify in stages; identifies knowledge transfer as a distinct skill beyond programming ability.