Thematic Analysis

Learning Goals

Lesson

Check Understanding

What is the difference between a code and a theme in thematic analysis?

A code is a label applied to a short segment of text describing what that segment is about. It is local and specific: a code might be "developer blames tooling" or "user asks for an example." A theme is a higher-level pattern that groups multiple codes together around a shared meaning. The theme "frustration with the development environment" might collect codes about tooling complaints, environment setup problems, and debugging difficulties from many different places in the data. Codes are the raw material; themes are the result of interpreting that material.

A researcher codes 200 issue comments and reports "I found five themes by grouping the comments that used similar words." What is missing from this approach and which step of the Braun & Clarke process did they skip?

Grouping by similar words is keyword clustering, not thematic analysis. The researcher skipped step 4 (reviewing themes) and arguably never completed step 2 correctly, since codes should capture meaning rather than vocabulary. Two comments can use the same word with opposite meanings, and two comments can describe the same idea with completely different words. The result of grouping by words is a vocabulary taxonomy, not a meaning taxonomy. What is missing is any evidence that the researcher read the data for meaning, checked that each candidate theme held together, and revised themes that were internally inconsistent.

Why is automatic sentiment analysis (positive, neutral, negative) not a substitute for thematic analysis?

Sentiment classifiers answer one narrow question: is this text expressing a positive or negative attitude? Thematic analysis answers a much richer set of questions: what are people actually saying, what arguments are they making, what concerns recur, and how do different kinds of responses relate to each other? A comment classified as "negative" by a sentiment model might be a technical objection, a personal complaint, a normative disagreement, or a request for clarification. Sentiment analysis collapses all of those into a single label and loses everything that would let you understand what is actually happening.

Aghajani et al. read 878 artifacts rather than using keyword search. What would keyword search have missed?

Keyword search finds text that contains a specific word or phrase. It would have missed every case where a developer described a documentation problem using non-standard vocabulary, hedged language, or indirect phrasing. It also cannot identify the absence of information: "incomplete examples" is a theme about what documentation fails to include, and no keyword signals that absence. More broadly, keyword search cannot group things by meaning — it groups things by surface form, which often produces the wrong groupings and misses real ones.

Exercises

Coding Stack Overflow comments

Read ten Stack Overflow comments on any Python or JavaScript question. Generate at least five initial codes describing what the comments are doing, for example: "corrects an error in the accepted answer," "asks for clarification," "provides an alternative solution," "warns about a version-specific behavior." Group your codes into two or three candidate themes, and write one sentence describing each theme. For at least one theme, identify a comment that almost fits but that you decided to exclude, and explain why.

Finding examples of a documentation theme

Aghajani et al. identified a taxonomy of documentation issues that includes categories such as "incorrect documentation," "incomplete examples," and "missing rationale." Pick one category from their taxonomy, then find three real examples of it in the documentation of any open-source library you have used. For each example, write one sentence explaining why it fits the category you chose, and one sentence describing how a reader encountering that documentation problem would be affected.

When sentiment scoring goes wrong

A study classifies GitHub issue comments as "positive," "neutral," or "negative" using an off-the-shelf sentiment analysis library. Identify two types of comments where automatic sentiment scoring would assign the wrong label, and write one sentence for each explaining why the classifier fails. For each case, describe what a human coder would need to know to assign the correct label that the classifier cannot know from the text alone.

Documenting an interpretive choice

Thematic analysis requires the researcher to make choices that another researcher might make differently. Return to the ten Stack Overflow comments you coded in the first exercise and identify one decision you made about where to draw a boundary between two codes or two themes. Describe the decision in one sentence, explain in one sentence why a different researcher might have drawn that boundary differently, and write two sentences describing how you would document that choice in a published paper so that a reader could evaluate your reasoning.

Evaluating threats to validity in a qualitative paper

Find the threats-to-validity section of any qualitative SE paper. Identify the threat the author discusses most prominently and write one sentence summarizing it. Write two more sentences evaluating whether the proposed mitigation is convincing: what does it address, and what does it leave unresolved? Then write a prompt you could give to an LLM to help you identify threats to validity in a qualitative methods section:

[your prompt here]