September 01, 2012: Shaping the Next Generation (or, the exam defines the course defines the discipline)

As we reported a few days ago, one of our contributors, Greg Wilson, gave a keynote at the MSR Vision 2020 workshop in Kingston on August 20. In that, he explored why there's still a gulf between software engineering researchers and the people who actually build software for a living (see the slides or the discussion on Reddit for details). He also said that:

  1. there's no easy way to close that gap, because most of the people in industry that researchers want to collaborate with have never encountered empirical software engineering studies, and therefore don't understand their scope or value; so
  2. researchers—many of whom are professors—should pivot the software engineering classes they teach to focus on how to analyze real-world data, and what past analyses have told us, so that the next generation of developers will understand (and listen, and want to collaborate).

To make this more concrete, Greg asked the workshop participants to make up some assignments and exam questions for such a course. Some of the suggestions are listed below; we would welcome other ideas as well (please post them as comments). We'd also like to know who'd be interested in trying to teach such a course at their institution, and what you think the prerequisites would have to be: statistics, obviously, but would a database course that introduced students to SQL be necessary? What about a natural language processing course? Or something else we haven't thought of?

Group 1

Give two examples of success stories in studies of the social aspects of software engineering.
  1. Reorganization based on social structures
  2. Identifying the "big players" in a software project
What are three sources of social interaction in software projects?
  1. Email
  2. IRC
  3. bug comments
  4. source code comments
Name three challenges in preprocessing emails.
  1. signatures
  2. code snippets
  3. stack traces
  4. fake/multiple email addresses
  5. identifying email headers and inline replies
  6. typos
  7. chat acronyms
  8. non-native speakers
  9. use of multiple languages
>Group 2

  1. You are given a dataset A of OSS projects and a subset of it B. Evaluate whether a hypothesis H can be rejected on A and B. Design the question in such a way that H is significant (at 0.05 level) at A and not B. Discuss the discrepancy.
  2. Given a dataset and a specific question, perhaps from exisitng MSR papers, discuss which data mining approach is best suited for that question.
  3. Given a specific question (e.g., bug finding) what repositories should you use to solve it? Illustrate it with Bugzilla. How do you adapt this to Jira?
  4. Given that two variables A and B correlate, can you say "A causes B"? Why or why not?
  5. Repeat an existing analysis from an MSR paper. Do you get the same results? Vary a number of variables. How different are the results?

Group 3

< OlderNewer >

This post originally appeared at It Will Never Work in Theory.