Recent Research Reading

I really did mean to blog more regularly about the research papers I'm reading... Oh well---here are the highlights from the last three months; together, they represent about 25% of what I've actually read. I haven't bothered hyperlinking, since many of them are behind paywalls, and the rest are easily googled (and I'm lazy). Amrit Tiwana: "Impact of Classes of Development Coordination Tools on Software Development Performance: A Multinational Empirical Study". Divides projects into four categories---no novelty, conceptual novelty, process novelty, or both---and coordination tools into six---requirements managemers, architectural modelers, test automation tools, test case development tools, configuration management, and ticketing systems---then looks at what effect each class of tool has on productivity for each kind of project. Turns out that each kind of tool helps some kinds of projects, but hinders others; no class of tool was helpful across the board. The paper is (at least) twice as long as it needs to be, but the results make the wading worthwhile. Diane Kelly and Rebecca Sanders: "Assessing the Quality of Scientific Software" (SE-CSE workshop, 2008). A short, useful summary of scientists' quality assurance practices. As many people have noted, most commercial tools aren't helpful when (a) cosmetic rearrangement of the code changes the output, and (b) the whole reason you're writing a program is that you can't figure out what the answer ought to be. Judith Segal: "Models of Scientific Software Development". Segal has spent the last few years doing field studies of scientists building software, and based on those has developed a model of their processes that is distinct in several ways from both classical and agile models. Yanbing Yu, James A. Jones, and Mayr Jean Harrold: "An Empirical Study of the Effects of Test-Suite Reduction on Fault Localization". A good test suite doesn't just find bugs; it helps you figure out where they are. The authors apply several heuristics to reduce the number of tests being run, and see what impact they have on how well those tests localize bugs. The results aren't earth-shattering, but are another step toward a new generation of better testing tools. Peter Rigby, Daniel German, and Margaret-Anne Storey: "Open Source Software Peer Review Practices: A Case Study of the Apache Server". Lots of people have studied code review in closed commercial shops (the best recent example being the one done by SmartBear at Cisco); here, the authors reverse engineer review frequency, artifact sizes, and so on from the Apache mailing list archives and version control repository to show that Apache's practices are characterized by (1) early, frequent reviews (2) of small, independent, complete contributions (3) conducted asynchronously by a potentially large, but actually small, group of self-selected experts. Thorsten Schäfer, Jan Jonas, and Mira Jezini: "Mining Framework Usage Changes from Instantiation Code". This is another cool-toward-a-tool idea: extract information from code that has already been hand-ported to a new version of a library or framework, then apply it to other code that needs porting. Danny Dig, Kashif Manzoor, Ralph Johnson, and Tien N. Nguyen: "Effective Software Merging in the Presence of Object-Oriented Refactorings". Text-based version control systems don't handle refactoring as well as they could, since they're blind to the semantics of what developers are doing. This paper presents a tool called MolhadoRef that takes language-level information into account when merging changes; it also reports on a case study and a controlled experiment to show that the tool does a better job than text-based merging. Romain Robbes and Michele Lanza: "SpyWare: A Change-Aware Development Toolset". The authors have built an IDE plugin that tracks the changes developers make, stores that log in a repository, then tries to use that information to help the developers do whatever they do next. Interesting to read it back-to-back with the preceding paper... Wojciech James Dzidek, Eric Arisholm, and Lionel C. Briand: "A Realistic Empirical Evaluation of the Costs and Benefits of UML in Software Maintenance". 54% improvement in the functional correctness of changes for a 14% increase in development time. I'm not completely convinced, but it's a good example of the kind of research software engineering should do more of. Andy Maule, Wolfgang Emmerich and David S. Rosenblum: "Impact Analysis of Database Schema Changes". The authors use dataflow analysis to identify the impact of RDBMS schema changes on OO applications---basically, if tables X, Y, and Z are changed, which classes need to be rewritten (or at least re-tested)? What makes the paper doubly interesting is their use of program slicing to reduce the code size, and thereby also the time required for the analysis. Steven P. Reiss: "Tracking Source Locations". Compares different ways of tracking logical locations in source code during development and refactoring. The result is that simple techniques are just as effective in practice as more complicated schemes. Gilly Leshed, Eben M. Haber, Tara Matthews, and Tessa Lau: "CoScripter: Automating and Sharing How-To Knowledge in the Enterprise". I've blogged about this before, but it's worth repeating. CoScripter is "a collaborative scripting environment for recording, automating, and sharing web-based processes" that allows people to record and share what they're doing with their browser, and share those recordings as an executable how-to with their colleagues. Cool. Andrew J. Ko and Brad A. Myers: "Debugging Reinvented: Asking and Answering Why and Why Not Questions About Program Behavior". Another good paper on Whyline, a tool that lets programmers ask "why does this variable have this value?" and "why didn't this branch get executed?" An empirical study found that novices using Whyline were twice as fast at finding bugs as experts without it. Xuezheng Liu, Zhenyu Guo, Xi Wang, Feibo Chen, Xiaochen Lian, Jian Tang, Ming Wu, M. Frans Kaashoek, and Zheng Zhang: "D3S: Debugging Deployed Distributed Systems". D3S is a model checking tool that lets developers specify predicates on distributed properties of a deployed system, then gathers data whlie the system is running and checks the predicates to detect errors. What makes it beautiful is that it automatically keeps enough trace data to tell the programmer why the fault occurred. I've wanted a useful and usable parallel debugger for more than 20 years; these ideas will probably be a big part of whatever finally comes along. Gursimran Singh Walia, Jeffer C. Carver, and Nachiappan Nagappan: "The Effect of the Number of Inspectors on the Defect Estimates Produced by Capture-Recapture Models". Capture-recapture is used to estimate the number of wild animals in an area---catch a few, tag 'em, release 'em, then catch a few more, see how many have already been tagged, do some statistical magic, and voila, a population estimate. The idea has been applied to bugs before, but here, the authors look at how the number of different bug-finding tools used affects the estimates produced by the technique. Again, it's not earth-shattering, it's just good, solid science. Diomidis Spinellis: "A Tale of Four kernels". Spinellis applies various quality metrics to the code bases of FreeBSD, GNU/Linux, Solaris, and Windows. "The aggregate results indicate that across various areas and many different metrics, four systems developed using wildly different processes score comparably. This allows us to posit that the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any." Again, I'm not convinced---I think the fact that the software engineers who work on kernels probably all read the same textbooks, and learned from the same examples, is at least as big a factor as requirements---but it's a good starting point for informed debate. Christine Hofmeister, Philippe Kruchten, Robert L. Nord, Henk Obbink, Alexander Ran, and Pierre America: "A General Model of Software Architecture Design Derived from Five Industrial Approaches". The authors compare and contrast five industrial software archtiecture design methods, and find that they have more in common than appears at first glance. I looked at four of the approaches they cover when I was teaching CSC407: Software Architecture, and didn't find any of them particularly useful; it's kind of nice to know I didn't miss anything. Joe Armstrong: "A History of Erlang". This massively-parallel Prolog-like language may or may not be the future of multicore computing, but Armstrong's retrospective look at its ups and downs, and the random acts of management that got it where it is now, is a lot of fun to read. Now, would someone please do a similar paper on Python? Kai-Yuan Cai and David Card: "An Analysis of Research Topics in Software Engineering---2006". The authors trawled the top seven journals and top seven conference in software engineering, classified the papers, and counted. The results? Testing and Debugging is a hot topic, with Management, Verification, and Design Tools & Techniques close behind. Fintan Culwin: "A Longitudinal Study of Nonorignal Content in Final-Year Computing Undergraduate Projects". Or, "Are CS student cheating more than they used to?" The answer from this four-year study seems to be "yes". Michael Terry and Matthew Kay: "Illustrated Consent Agreements". Terry is the driving force behind InGimp, an instrumented version of the open source image manipulation tool Gimp that collects information about what tools people are using and how. Since many of its users don't speak English, he and Kay put together a cartoon-style consent agreement; in this paper, they report a study that measures how well users understand what it's trying to tell them. "Surprisingly well" is the answer. Mark Staples, Mahmood Niazi, Ross Jeffery, Alan Abrahams, Paul Byatt, and Russell Murphy: "An Exploratory Study of Why Orgnaizations do not Adopt CMMI". The most popular reasons are "we're too small", "it's too costly", and "we don't have time". I don't know if "it doesn't actually make much of a difference" was an option or not. Renée McCauley, Sue Fitzgerald, Gary Lewandowski, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander: "Debugging: A Review of the Literature from an Educational Perspective". What are students taught, what do they actually do, how much do they understand---this paper organizes and presents previous work on these and related questions. Aaron B. Brown and Joseph L. Hellerstein: "An Approach to Benchmarking Configuration Complexity". As my students are discovering, setting software up can be as hard as building it. Here, the authors present a very simple model of human information processing, then use it and operation counts to quantify how easy or hard different applications are to get running. Their model is deliberately simple, and intended to use as a baseline for further work; it's a very interesting idea. Benjamin Livshits and Emre Kiciman: "Doloto: Code Splitting for Network-Bound Web 2.0 Applications". Doloto is a tool that automatically breaks browser-based web apps into chunks that can be transferred on demand to reduce startup times; they report that it typically reduces time-to-first-interaction by 20-40%. Emre Kiciman and Benjamin Livshits: "AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web 2.0 Applications". Same guys, different tool---this one performs on-the-fly parsing and instrumentation of Javascript as it is sent to the browser to gather "just enough" data. Students take note: a lot of today's coolest ideas involve metaprogramming of one kind or another. Sumit Basu, Surabhi Gupta, Milind Mahajan, Patrick Nguyen, and John C. Platt: "Scalable Summaries of Spoken Conversations". Text summarization (i.e., boiling a large document down to a small one that still contains key information) is a hot topic in natural language processing. Here, the authors describe a tool for producing scalable summaries, i.e., summaries that can be shrunk or enlarged on demand to give as much or as little detail as the user wants. Liang Zhou and Eduard Hovy: "Digesting Virtual 'Geek' Culture: The Summarization of Technical Internet Relay Chat". A more specialized variation on the same theme, this paper from 2005 presents a tool tailored to summarizing IRC chats on technical topics using a variety of heuristics and machine learning techniques. One of our summer students, Kosta Zabashta, may try incorporating these ideas into DrProject's IRC interface. Raman Ramsin and Richard F. Paige: "Process-Centered Review of Object Oriented Software Development Methodologies". Compares and contrasts several OO methodologies, from RUP to the more famous members of the agile clan. About as exciting as cardboard, but I took away some ideas that I'll include in the next run of CSC301: Software Engineering. Jon Howell, Collin Jackson, Helen J. Wang, and Xiaofeng Fan: "MashupOS: Operating System Abstractions for Client Mashups". Turns analogies between features of desktop operating systems and the way Web 2.0 apps work into a set of abstractions that isolate mutually-untrusting web services within browsers. Christoph Csallner, Yannis Smaragdakis, and Tao Xie: "DSD-Crasher: A Hybrid Analysis Tool for Bug Finding". Combines existing tools to create one that captures the program's intended execution via dynamic invariant detection, statically analyzes the program to explore candidate paths, then automatically generates test cases to (try to) reproduce bugs. Again, it's not quite ready for prime time, but look for it in your IDE in not too many years. Hung-Chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker: "Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters". Adds a merging step to Google's MapReduce model to allow operations on inhomogeneous data (i.e., tables of different shapes). The second half of the paper shows how to express basic relational queries in terms of the MRM model. Stephen F. Siegel, Anastasia Mironova, George S. Avrunin, and Lori A. Clarke: "Combining Symbolic Execution with Model Checking to Verify Numerical Programs". Combines model-checking and symbolic execution to compare parallel versions of programs with their sequential (and usually much simpler) counterparts by getting a symbolic trace from the sequential program, then looking for steps in the parallel program that violate it. Doesn't scale up to large programs yet, but they do give several examples for small numbers of processors and interesting linear algebra codes.