Archive

Archive for the ‘Research’ Category

Two Solitudes Illustrated

December 6th, 2012

Jorge Aranda and I submitted a short opinion piece to Communications of the ACM in February 2012 that discussed some of the reasons people in industry and academia don’t talk to each other as much as they should. Ten months later, it has ironically turned into an illustration of one of the reasons: it was six months before we received any feedback at all, and we’ve now waited four months for any further word. In that time, Jorge has left academia and I’ve taken a job with Mozilla, so we have decided to withdraw the manuscript and publish it here. We hope you find it interesting, and we would welcome comments.


Two Solitudes

Greg Wilson and Jorge Aranda

In 2001, one of us (GW) started supervising senior undergraduate projects in computer science at the University of Toronto. His main reason for doing it was because it was fun, but he was also frustrated by how little the junior programmers we were hiring straight out of school knew about actually building software. They could talk big-O ’til they went blue in the face, but many had never seen version control, had forgotten what little they had ever learned about using Make, and thought testing was something you only did when it was in the grading scheme.

This isn’t a new complaint, of course. People in all fields have long complained that universities don’t prepare students for the real world, while professors have always countered that their job is to teach timeless fundamentals. What surprised him, though, was how little interest the two sides seemed to have in getting to know each other, at least in software engineering. While researchers and practitioners may mix and mingle in other specialties, every software engineering conference seemed to be strongly biased to one side or the other.

For example, less than 20% of the people who attend the International Conference on Software Engineering come from industry, and most of those work in labs like Microsoft Research. Conversely, only a handful of grad students and one or two adventurous faculty attend big industrial conferences like the annual Agile get-together.

One consequence of this is that researchers and practitioners spend a lot of time talking past one another. Wilson ran into this headlong six years ago when he was asked to teach an undergraduate course on software architecture. A quick search on Amazon turned up plenty of books on the subject with words like “Practical” and “Essential” in their titles. What didn’t turn up was descriptions of the actual architectures of actual systems. Every book talked about how to describe architectures, how important it was to have a good one, and so on. When it came time to actually show readers a few, though, all they offered were a couple of pages on pipe-and-filter, client-server, MVC, and possibly some kind of peer-to-peer system. And even then, most didn’t discuss actual systems: the boxes in their box-and-arrow diagrams had labels like “component 1″ and “component 2″.

The more he thought about this, the stranger it seemed. We wouldn’t think much of a university program in architecture whose graduates had never studied any real buildings in detail. We also wouldn’t be surprised if those graduates were as bad at designing buildings as most freshly-minted computer scientists are at designing software.

In this case, the problem suggested its own solution. In May 2006, Wilson emailed every famous programmer he could find an address for and invited them to contribute a chapter to a book on software design. More specifically, he asked them to describe the most beautiful piece of code they’d ever seen or written, and explain what made it beautiful in their eyes.

The result, published a year later as Beautiful Code, was well received by practitioners, but uptake in academia was close to zero. As he was trying to figure out why not, he was asked to teach another undergraduate course, this time on software engineering. Once again, he discovered that there was a lot less in most textbooks than met the eye. For example, the books all described UML in great (some might say “excruciating”) detail, but in his eighteen years as a professional programmer, Wilson had only ever worked with one programmer who actually used it voluntarily (a Russian mathematician who wouldn’t tie his own shoes without first brushing up on knot theory). Conversely, making things installable is as big a part of developing real applications as allocating methods to classes, but most books didn’t discuss it at all, and those that did seemed to think it was a question of keeping a configuration database up to date.

But practitioners were (and are) guilty of equally great sins. At industry-oriented gatherings, it seems that a strong opinion, a loud voice, and a couple of pints of beer constitute “proof” of almost any claim. Take test-driven development, for example. If you ask its advocates for evidence, they’ll tell you why it has to be true; if you press them, you’ll be given anecdotes, and if you press harder, people will be either puzzled or hostile.

Most working programmers simply don’t know that scientists have been doing empirical studies of TDD, and of software development in general, for almost forty years. It’s as if family doctors didn’t know that the medical research community existed, much less what they had discovered. Once again, the first step toward bridging this gulf seemed to be to get each group to tell the other what they knew.

On the one side this led to The Architecture of Open Source Applications, in which the people behind a double dozen open source projects walk readers through the high-level design of their applications, and explain why things are the way they are and how well they’ve worked. Some of the programs they discuss, like Bash and Sendmail, are as old as the Internet, while others are as fresh as this morning’s top questions on Stack Overflow. And while some are gems of clean design, others are, in the words of one contributor, like third-world cities, with clean, well-kept neighborhoods lying next to run-down slums that no one seems willing to clean up.

On the other side is Making Software, in which leading software engineering researchers present and discuss key discoveries. The topics range from the impact of test-driven development on productivity (probably small, if there is one at all) to whether machine learning techniques can predict fault rates in software modules (yes, with the right training). One favorite is the discovery that geographic distance between members of a development team is only a weak predictor of how many errors there are in their software; a much better predictor is how far apart they are in the company org chart.

After Making Software came out, we, along with Daniela Damian, Marian Petre, and Margaret-Anne Storey, decided to explore how practitioners perceived software development research. We interviewed several high-profile practitioners (CEOs, senior architects, managers, developers, and entrepreneurs) and asked them what they thought of their academic counterparts, and what questions they thought researchers should focus on.

Their answers were scathing, but not surprising. They saw software engineering research as dated, dogmatic, focused on pointless questions, and biased toward either big projects or toy problems. In the words of one senior architect we interviewed:

[I'm afraid] that industrial software engineers will think that I’m now doing academic software engineering and then not listen to me. (…) If I start talking to them and claim that I’m doing software engineering research, after they stop laughing, they’re gonna stop listening to me. Because it’s been so long since anything actually relevant to what practitioners do has come out of that environment, or at least the percentage of things that are useful that come out of that environment is so small.

This kind of criticism is understandable. After all, plenty of practitioners remember having wasted countless hours warming the bench at college while a professor droned about the vital importance of this process or that notation with minimal experience and maximum conviction. And yet, as demonstrated by Making Software, and by the increasing number of savvy papers appearing each year, research is shifting in ways that practitioners would welcome if they hadn’t been conditioned by past irrelevance to dismiss everything coming out of academia.

In 2011, we presented the results from our interviews in a panel at the International Conference on Software Engineering (ICSE). Our panelists were people with one foot on each side of the divide: Lionel Briand from the Simula Research Lab; Toshiba’s Tatsuhiro Nishioka; Google’s John Penix; Wolfram Schulte, from Microsoft Research; Peri Tarr, from the IBM T.J. Watson Center; and David Weiss, from Iowa State University. The bad news was that the panelists confirmed the near-complete disconnect between software research and practice. The good news was that judging by comments from them and from the audience, plenty of people would like to fix that.

But achieving that will be hard, because the root problem isn’t entirely one of perceptions. There actually are differences between research and practice, three of which stand out. First, the incentive structure for researchers does not reward patient cultivation of long-lasting partnerships with practitioners. Second, researchers and practitioners have different understandings of what counts as evidence. Practitioners, being trained with an engineering mindset, expect generalized and quantitative results. They want to know what by percentage productivity will improve if they adopt a new practice, but this is a level of precision that no honest scientist can offer them today. And third, most research findings offer only piecemeal improvements; it simply isn’t worth a practitioner’s time to fight the inertia in their organizations for gains which are both small and uncertain.

In Canada, the phrase “two solitudes” refers to the lack of communication—and the lack of interest in communicating—between Anglophones and Francophones. Over the past three years, we have learned that it’s also a good description of the gulf between software engineering researchers and practitioners. They’re like two branches of an extended family that send each other Christmas cards, and occasionally show up for each other’s weddings or funerals, but aren’t in day-to-day or even year-to-year contact.

It doesn’t have to be like this, of course. Many researchers (particularly younger ones) would love to talk to practitioners about what problems really matter. At the same time, practitioners could save themselves a lot of heartache by finding out what we actually do know about how to develop software, and by learning how to tell something that has been proven from something that has merely been asserted.


Greg Wilson has been a programmer, teacher, and author. He can be found online at http://software-carpentry.org, http://aosabook.org, and http://neverworkintheory.org. He received his PhD in Computer Science from the University of Edinburgh in 1993.

Jorge Aranda is a software developer. He received his PhD in Computer Science from the University of Toronto in 2010, and until recently conducted research on coordination in software teams.

If you enjoyed this post, you may also enjoy the presentation Greg gave at MSR Vision 2020.

Research

Two Solitudes (talk)

August 21st, 2012

The slides from my keynote today at MSR Vision 2020 are now available on Slideshare. Long story short, I proposed that the only way to improve communication between researchers and practitioners in software engineering is to create an empirical, evidence-based course in software engineering (with assignments and exams that require students to analyze code and data themselves, do grounded theory analysis of interviews, etc., so that they understand how these studies actually work and what questions they can and can’t say), and then wait ten years for those students to become team leads and managers.

Later: the slides are being discussed on Reddit. Cool!

Research

Past and Future

September 19th, 2011
Comments Off

Two articles I read over the weekend neatly encapsulate the past and future of software engineering research (at least, the kind of SE research that I’m interested in). The first is Jason McC. Smith’s “The Pattern Instance Notation: A simple hierarchical visual notation for the dynamic visualization and comprehension of software patterns” (Journal of Visual Languages and Computing, 22 (2011), pp. 355-374, DOI:10.1016/j.jvic.2011.03.003), which describes a nested-box notation for representing design patterns graphically. The second is “A decade of research and development on program animation: the Jeliot experience” by seven scientists from the Weizmann Institute and the University of Eastern Finland (same journal and volume, pp. 375-384, DOI:10.1016/j.jvic.2011.04.004), which sums up ten years of empirical studies of three successive program animation tools. Both papers are about helping people understand programs through pictures, but there’s one key difference. While the first paper is long on semantics and examples, it doesn’t say anything about usability: its author claims that PIN is easy to understand (or at least, easier than alternatives like pure UML), but the same claims are made daily by the creators of every other programming system or notation out there. The second paper, in contrast, is all about such studies. Do novices learn better, or faster, if program execution is animated? Do such animations help experienced programmers as well? Do the two groups benefit from the same kinds of animations equally, or are their needs different enough to merit different approaches?  And at a higher level, how should we go about trying to answer these questions, and how have the answers we’ve found so far affected the design of our tools? Its claims might be more cautious than those of the PIN paper, but they’re grounded in careful studies of real users in the real world.  My 21-year-old self might have preferred the simplicity of the former, but today, having seen a dozen bandwagons roll by, I realize that the caution and seasoned subtlety of the latter is what real progress actually looks like.+

Research

Summary of ICSE Panel

June 9th, 2011
Comments Off

Jorge Aranda has posted a good summary of the panel session at at ICSE on “What Industry Wants From Research“. We hope to have more news soon…

Making Software, Research

Refactoring Yahoo! Pipes

June 3rd, 2011

I’ve been griping on Twitter about the fact that the official copies of most IEEE and ACM papers are hidden behind paywalls, which is a great way to ensure that they don’t reach people in industry who might otherwise, you know, read them, learn something, and possibly even adopt the tools, practices, or ideas they contain. Luckily, most researchers routinely break those copyright agreements and post their work on their own sites, so I’m actually able to read some of the papers that were presented at ICSE 2011 this year.  The best so far is Kathryn Stolee and Sebastian Elbaum’s “Refactoring Pipe-like Mashups for End-User Programmers“, in which they look at the dataflows people are constructing using Yahoo! Pipes [1] and categorize ways of reorganizing them to eliminate redundancy, improve efficiency, and so on. It’s solid, practical work, and the paper is a pleasure to read.

[1] Pipes is the best approximation I’ve seen to date of something that puts the power of the mashable web in the hands of everyday people.  If you haven’t seen it, give it a try, and then round up some friends and create an open source clone using HTML5, processing.js, and similar tools.

Research

How Do Actual Software Engineers Perceive Software Engineering Research?

May 20th, 2011
This post is based on the work of Jorge Aranda, Margaret-Anne (Peggy) Storey, Daniela Damian, Marian Petre, and me. It is a re-post from Jorge’s blog—please post your thoughts there.


This research was done as a follow-on to “Making Software“, which summarizes what we actually know about how software is developed, and why we believe it’s true.

Listening to software professionals over the past few years, we sometimes get the impression that software development research began and ended with Fred Brooks’ case study of the development of the IBM 360 operating system, summarized in “The Mythical Man-Month,” and with his often-quoted quip that adding people to a late project only makes it later. Now and then, mentions of Jerry Weinberg (on ego-less programming) and of DeMarco and Lister (on how developers are more productive if they’re given individual offices) pop up, and for the most part, it seems as if the extent of what software development academics have to offer to practitioners is a short list of folk sayings tenuously validated by empirical evidence. The fact that Brooks, Weinberg, DeMarco, and Lister are not academics—or were not at the time of these contributions, as in the case of Brooks—only makes the academic offerings look worse.

And yet, the software development academic community is considerably large and increasingly empirical. The International Conference on Software Engineering (ICSE), its most important gathering, consistently draws a crowd of over a thousand researchers. Researchers mine software repositories, they perform insightful ethnographic studies, and they build sophisticated tools to help development teams become more efficient. Many researchers, from junior Masters students to tenured professors, jump at the opportunity to study and help software organizations. In other words, there is a significant academic offering of results on display. But if we look at the list of ICSE attendees, we discover that industrial participation is very low (less than 20% last year), and there seems to be very little dissemination of scientific findings overall. What is going on? Are we wasting our time studying problems that practitioners do not care about? Or do we have a communication problem? Are practitioners expecting help with intractable problems? And most importantly, how can we change this situation?

To explore these questions, we decided to interview leading practitioners. Over the past few months, we talked to CEOs, senior architects, managers, and creators of organizations and products most of us would recognize and use. We asked them to tell us their perceptions of our field and how they think we could improve our relationships with them. One outcome of these interviews was the organization of a panel at ICSE, where people that straddle the line between research and practice will use insights from these interviews as a starting point to discuss the apparent industry-research gap.

We are still thinking about how to disseminate the observations that our ongoing interviewees have been giving us. For now, we want to broadcast some of the most important points from our conversations here, in blog post format, hoping to give them as much exposure as possible.

Perceptions of software research

For those of us venturing out of the ivory tower to do empirical research, it shouldn’t be a surprise that many practitioners have a general disregard for software development academics. Some think our field is dated, and biased toward large organizations and huge projects. Others feel that we spend too much time with toy problems that do not scale, and as a result, have little applicability in real and complex software projects. Most of our interviewees felt that our research goals are unappealing or simply not useful. This was one of the strongest threads in our conversation: one person told us that our field is this “fuzzy stuff at a distance that doesn’t seem to affect [him] much,” another, that we ignore the real cutting-edge problems that organizations face today, and one more, a senior architect about to make the switch to academia himself, gave a rather scathing critique of the field.

“[I'm afraid] that industrial software engineers will think that I’m now doing academic software engineering and then not listen to me. (…) if I start talking to them and claim that I’m doing software engineering research, after they stop laughing, they’re gonna stop listening to me. Because it’s been so long since anything actually relevant to what practitioners do has come out of that environment, or at least the percentage of things that are useful that come out of that environment is so small.”

Part of the problem seems to be that we have only been able to offer professionals piecemeal improvements. Software development is essentially a design problem, a wicked problem, and it is not amenable to silver bullets (as, ahem, Fred Brooks argued convincingly decades ago). But the immaturity and difficulty of software development still make it a prime domain for the presence and profit of snake oil salesmen—people that are not afraid to advertise their miraculous formulas, grab the money and run. Honest academics, reporting improvements of 10% or 20% for a limited domain and under several constraints, have a hard time being heard above the noise.

Difficulty in applying our findings

The problem with piecemeal improvements has another angle: many professionals can’t be bothered to change their processes and practices for gains as small as 10% or 20%, since overcoming their organizational inertia and forcing themselves to incur significant risks may be more costly than the benefits they’d accrue.

“(…) it would depend in part of how cumbersome your techniques are; how much retraining I’m going to have to do on my staff. (…) I might decide that even if you’re legit and you actually do come up with 15%, that that’s not enough to justify it.”

This puts us in a bit of a quandary as we’re extremely unlikely to come up with any technique that will guarantee a considerable improvement for software organizations. At the same time, they’re extremely unlikely to adopt anything that doesn’t guarantee substantial improvements or that requires them to change their routines significantly. However, there are a few ways out of this problem. One of them is to propose lightweight, low-risk techniques. Another is to aim for organizational change at the periphery, in pilot projects, rather than at the core, hoping that the change will be appealing enough that it will spread through the organization. But it’s an uphill battle nonetheless.

What counts as evidence?

Another, perhaps bigger problem lies in the perception of what counts as valid scientific evidence. For better or worse, software developers have an engineering mindset, and have an idea of science as the calm and reasoned voice of hard data among the cackling of anecdote. The distinction between hard data and anecdote is binary, and hard data, according to most of our interviewees, is quantitative data; anything else is anecdote and should be dismissed.

“without measurements you can’t… it’s all too wishy-washy to be adopted.”

“managers are coin operated in some sense. If you can’t quantify it in terms of time or in terms of money, it doesn’t make much difference to them. (…) I think there does need to be some notion of a numeric or at least an objective measure.”

“So when you’re gonna tell me that I’m wrong, which is a good thing, you know you gotta have that extra ‘yeah, we ran these groups on parallel and guess what, here are the numbers’”

Why is this a problem? Because over the years, we as a community have come to realize that many of the really important software development problems are not amenable to study with controlled experiments or with (exclusively) quantitative data. Ethnographies, case studies, mixed-method studies, and others, can be as rigorous as controlled experiments, and for many of the questions that matter, they can be more insightful—but they don’t have the persuasive aura of a string of numbers or a p value. Faced with this perception, we have two choices. First, to give practitioners what they (think they) want: controlled experiments to the exclusion of everything else (never mind the fact that often these won’t be able to actually answer the questions that matter to professionals in a scientifically sound manner), or second, to push for a better dissemination of our results and methods, making the argument that there’s more to science than trial runs and statistical significance, and helping practitioners distinguish between good and bad science, whatever its methods of choice.

Dissemination of results

Although, from talking to our interviewees, it was clear that the dissemination of scientific results is almost non-existent, this seems to be a problem that we can address more easily than the others. Of course, presenting research findings to non-academics, as our interviewees reminded us, is difficult; you need to be a good storyteller, you need passion, clear data, and a strong underlying argument. To some extent, this is feasible.

In any case, it became evident that academic journals and conferences are not the right venues to reach software professionals overall. Blog posts may help communicate some findings (but it is hard to be heard above the noise), and books could help too (especially if you have Brooks’ writing abilities). Another alternative is intermediate journals and magazines, like IEEE Software and ACM Queue. One interviewee suggested that we should be visiting industry conferences way more often; when a researcher ventures into an industry conference with interesting data, it does seem to generate excitement and good discussions, at the very least.

Areas of interest

We asked our interviewees what questions should we focus on; that is, what problems do they struggle with on a frequent basis that researchers may tackle on their behalf. A few themes arose from their lists of potential problems:

  • Developer issues were very common. These include identifying wasteful use of developer time, keeping older engineers up to date with a changing landscape (an interesting riff on the rather popular research question of bringing new engineers up to speed with the current organizational landscape), identifying productive programmers and efficient ways to assemble teams, overcoming challenges of distributed software development, achieving better effort prediction, learning to do parallel programming well, and identifying mechanisms to spread knowledge of the code more uniformly throughout the organization.
  • Evaluation issues also arose frequently. Essentially, these consist of having academia perform the role of fact checker or auditor of proposals that arise from consultants, opinion leaders, and other influential folks in the software development culture. Many interviewees were curious to find to what extent does agile development work as well as its evangelists claim it works, for instance, but their curiosity also extends to other processes, techniques, and tools.
  • Design issues came up as well. One in particular seemed interesting: figuring out why some ideas within a project die after a lot of effort was spent on them. This could lead into techniques to identify ideas probably doomed to failure early on, so that the team can minimize the resources spent on them.
  • Tool issues were rather popular, and on many of the tools that our interviewees mentioned there is already some good work from our community that hopefully can be turned into tools that will be successfully adopted by the mainstream. Our interviewees were interested in tools that would provide warnings as a developer was to enter a conflicting area of the code, in good open source static analysis tools, in test suite analytics, and in live program analysis tools that scale well.
  • Code issues, though less common, were interesting as well. In particular, studying and providing help in dealing with the blurred line between project code and configuration code (and treating configuration code with the same care and level of tool-set sophistication that we give to project code), and providing a better foundation for higher-level abstractions such as modeling languages.
  • User issues arose more frequently than they seem to in our academic literature. Several of our interviewees wanted to bring user experience to the forefront, and some were concerned that software development skill and user experience gut instinct were rarely found in sufficient quantities in the same professional. One of them wanted to bring the kind of mining techniques that we use to analyze software repositories into an analysis of customer service audio and email data.

So as you can read, there were plenty of interesting research questions brought up by our interviewees. Some of these questions are more approachable than others, some have already been addressed numerous times in research and are therefore now in need of better dissemination of findings.

In summary, the managers, creators, and architects we interviewed confirmed our fear that the software research academic community is extremely disconnected from software practice. This seems to be partly our fault (we often do not work on the issues that practitioners worry about, we rarely reach out to them purposefully), and partly a misconception of what it means to do science and what counts as valid evidence in our domain.

We hope to further explore these initial insights from industry at our upcoming panel at ICSE. We have sought panelists that straddle the line between research and practice to provide their perspectives on what they think compelling evidence would look like to industry, what they consider to be the important questions for academia, to suggest to us how we can more effectively disseminate results and to suggest how we can engage in productive collaborative research that is of benefit to both sides. In the meantime, we welcome your comments on this post! And stay tuned, as we will follow up to summarize the discussion from the ICSE panel.

Research

More Musings on the Value of a PhD

December 29th, 2010

Yet another good post from Mark Guzdial pointed me at an article in The Economist about the value (or otherwise) of a PhD. Key stat (bold emphasis mine):

A study in the Journal of Higher Education Policy and Management…shows that British men with a bachelor’s degree earn 14% more than those who could have gone to university but chose not to. The earnings premium for a PhD is 26%. But the premium for a master’s degree…is almost as high, at 23%. In some subjects the premium for a PhD vanishes entirely. PhDs in maths and computing, social sciences and languages earn no more than those with master’s degrees. The premium for a PhD is actually smaller than for a master’s degree in engineering and technology, architecture and education. Only in medicine, other sciences, and business and financial studies is it high enough to be worthwhile. Over all subjects, a PhD commands only a 3% premium over a master’s degree.

Learning, Research

Yes, We *Can* Design Languages for Human Beings

October 8th, 2010

Lambda the Ultimate recently had a nice summary of a paper titled, “Is Transactional Programming Actually Easier?” In it, Rossbach, Hofmann, and Witchel report a study in which 147 undergrads in an operating systems course solved problems using traditional concurrency control mechanisms and newfangled memory transactions. The result? Students reported that transactions were harder to use, but actually had fewer errors in their synchronization code when using them.

At the risking of sounding like a curmudgeon (yes, David, I’m looking at you), why the hell don’t we do more of this? Why don’t we apply usability testing techniques to programming language features?  It’s easy to do in small cases, and as Microsoft’s Steven Clarke discusses in his chapter in Making Software, when done systematically, it can make programmers’ lives a lot better.

Research

Students and Code Review

August 16th, 2010
Comments Off

Mike Conley has posted some early results from his study of student code reviews.  One of the most interesting is that students who have reviewed their peers’ code believe they know more about the quality of their own work. He should be finished his analysis in a few weeks—I’m looking forward to hearing what else he’s found.

Research

Participants Needed for a Study of Code Review

August 8th, 2010
Comments Off

Andrew Smith, a graduate student I have been working with, is looking for people to take part in a study:

…to determine what advantages and disadvantages three types of code review tools have.

As a participant you would spend about an hour doing 3 code reviews (20 minutes each):

  • One on paper, just like marking in the good old days.
  • One using ReviewBoard, a popular code review tool.
  • One using a Tablet PC, with custom-built software.

No code review or marking experience is required, all that you need is to be familiar with either C or Java.

If you can spare an hour, we’d both be grateful for your help.

Research