What Qualitative Data Looks Like in SE Research

Learning Goals

Lesson

Check Understanding

What distinguishes qualitative data from quantitative data? Give one example of each from the same GitHub repository.

Quantitative data consists of counts or measurements that can be compared numerically: for example, the number of open issues in a repository. Qualitative data consists of text or other non-numerical content that has to be interpreted for meaning: for example, the text of those issue descriptions, which might reveal whether developers are frustrated, confused, or requesting enhancements. Both come from the same repository, but they answer different questions.

A researcher wants to study developer frustration during code review. Their approach is to count the number of words in each review comment and use that count as a measure of frustration. What is wrong with this approach and what should they do instead?

Word count is a proxy for frustration, and a bad one. A long comment might be a patient explanation of a complex issue, not a frustrated rant. A short comment might be curt and hostile. Word count measures length, not sentiment or emotion. The researcher is substituting something easy to measure for something meaningful but hard to measure. A better approach would be to read a sample of comments and code them for frustration directly, using a defined scheme, then measure agreement between two coders. Alternatively, they could recruit developers to rate comments on a frustration scale, which at least grounds the measure in human judgment.

When is qualitative research the right choice? Give two criteria and one example from SE research.

Qualitative research is appropriate when you do not yet know what the important variables are, so you cannot write a survey with fixed-response options. It is also appropriate when the phenomenon involves social or motivational factors that numbers cannot capture. The Ait et al. study illustrates both: before interviewing maintainers about why projects go inactive, researchers did not know whether the cause was technical (no one could merge contributions), social (maintainers burned out), or economic (employers withdrew support). A survey with fixed options would have assumed answers to a question that was still open.

Hoda [Hoda2024] argues qualitative work is harder to do rigorously than quantitative work. What makes it harder?

In quantitative work, the analysis pipeline is mostly fixed once the data is collected: apply a statistical test, report a p-value and effect size. In qualitative work, the researcher is the instrument. Two people reading the same interview can reach different conclusions depending on their background, assumptions, and what they noticed first. Making the analysis rigorous requires documenting every interpretive choice, having multiple coders work independently and measuring their agreement, and being transparent about how themes were constructed. There is no statistical test that can substitute for that kind of disciplined attention to the text.

Exercises

Reading commit messages for motivation

Find five recent commit messages in any open-source repository on GitHub. For each one, write one sentence describing the developer's apparent motivation. Compare your interpretations with a partner: where did you agree, where did you disagree, and what does each disagreement reveal about the difficulty of reading intent from a short text?

Writing open-ended interview questions

The Ait et al. study counts inactive projects but cannot explain why they went inactive from survival data alone. Write three interview questions you would ask a former maintainer to understand why they stopped contributing. Each question must be open-ended (cannot be answered with yes or no) and non-leading (does not hint at the answer you expect). For each question, write one sentence explaining what kind of information it is designed to surface.

Critiquing a proxy measure

A colleague proposes to measure "developer satisfaction" by running a sentiment classifier over commit messages and computing the fraction of positive-sentiment commits per developer per month. Identify two specific ways this approach might mislead you that a survey with open-ended questions would not. For each problem, write one sentence describing what the open-ended question would reveal that the automated measure would miss.

Sorting phenomena by method

List three SE phenomena that you think quantitative methods handle well and three that you think require qualitative investigation. For each of your three qualitative examples, write one sentence stating the specific question you would be trying to answer and one sentence explaining why a count or measurement would not be sufficient to answer it.

Diagnosing a common mistake

Read the abstract of Hoda [Hoda2024] and identify what the author describes as the most common mistake researchers make when analyzing qualitative data. Write one sentence summarizing the mistake and one sentence explaining how you would avoid it in your own work. Then write a prompt you could give to an LLM to help you detect that mistake when reviewing a qualitative methods section:

[your prompt here]