Qualitative Methods: Interviews and Surveys

Qualitative methods are how you find out what is actually going on when numbers alone cannot tell you. For example, in 2024, a team led by Jenny Liang surveyed 410 professional developers across industries about their experience with AI programming assistants [Liang2024]. They did not just ask "is this useful?" They asked about specific scenarios, frustrations, workarounds, and the kinds of tasks where developers had learned to distrust the tools. The result was a detailed map of where AI assistance actually helps and where it gets in the way—knowledge that would have been invisible to a study that only measured task completion times.

A smaller but deeper example: Johnson et al. interviewed 20 developers to ask why static analysis tools are so rarely used despite being widely available [Johnson2013]. Every participant believed the tools are beneficial, but false positives and poor warning presentation were the main barriers. Where Liang provides breadth (410 developers across industries), Johnson provides depth (20 developers with detailed follow-up), which illustrates the scale-versus-richness trade-off between surveys and interviews.

When Qualitative Methods Are the Right Choice

Designing Interviews

Designing Surveys

Thematic Analysis

Experience Sampling

Triangulation

Grounded Theory

Ethical Considerations

Misconceptions

Leading questions don't bias interview results if you're aware of them.
Even when you know a question is leading, the participant may still be pushed toward the expected answer. "Most developers find AI helpful — would you agree?" presupposes the answer. Rewrite as an open question: "What has your experience been with AI coding tools?"
Surveying your own team or users is sufficient for internal research.
Convenience sampling gives you data about a specific group that may not represent the population you want to generalize to. If only people who already agree with you respond, the results will confirm your assumptions.
You can stop analyzing data once you have enough to support your hypothesis.
Confirmation bias pushes researchers to stop analysis when the data supports their expectations. Rigorous qualitative work continues until saturation.
A study of 20 developers at one company can describe "what developers think."
Overgeneralization turns findings from a specific sample into claims about a broader population. "Developers find AI tools frustrating" from a study of 20 junior developers at one company is overgeneralized. The claim should specify who, where, and under what conditions.
A low response rate doesn't matter if you have enough responses.
If only 10% of people surveyed respond, the 90% who did not may have very different views. Non-response bias can be larger than sampling error, and a high absolute number of responses does not correct for it.
Qualitative research is just asking people what they think.
Rigorous qualitative work involves systematic data collection, structured analysis, documentation of decisions, and assessment of intercoder reliability. Asking a few colleagues over lunch is not a study.
More interviews always produce better qualitative results.
Depth matters more than volume. A study with twenty rich, well-analyzed interviews reaching saturation is more informative than one with a hundred superficial ones that never probe below the surface.
A high response count makes a survey representative.
Representativeness depends on who responds relative to who you want to generalize to, not on the raw number of responses. A million responses from a self-selected online audience is still a biased convenience sample.
Thematic analysis is subjective and therefore unreliable.
The subjectivity of interpretation is a known feature, not a flaw. Qualitative researchers manage it through audit trails, multiple coders, and transparent documentation of how codes and themes were derived.
Any descriptive label counts as a valid code.
A code named "AI" or "trust" is a noun bucket, not an analysis. Good codes capture what a participant is doing: "switching off AI suggestions after a bad experience" tells you something; "negative AI attitude" does not. The same principle applies to themes: a theme that cannot be expressed as a claim is a filing category, not a finding.

Check Understanding

What is the difference between open coding and axial coding in thematic analysis?

Open coding is the first pass through the data, where you tag individual segments with descriptive labels close to what the participant actually said. Axial coding is a second-order process where you group those labels into higher-level themes and begin to examine how the themes relate to each other. Open coding is inductive and close to the data; axial coding is more interpretive and moves toward an explanatory structure.

A researcher surveyed developers by posting a link in a popular programming subreddit and got 800 responses. They concluded that "the majority of developers are satisfied with AI coding tools." Identify two specific problems with this conclusion.
  1. The sample is a convenience sample skewed toward developers who actively participate in that community, who are likely more technically engaged, and are more likely to already use AI tools than average.
  2. The phrasing "majority of developers" implies generalizability to a population (all developers) that the sample does not represent. The conclusion should be "the majority of respondents to this survey were satisfied", which is a much weaker but more accurate claim.
  3. People who are satisfied are more likely to respond to a survey about satisfaction.
Why is "Would you use an AI coding assistant if it were integrated into your IDE?" a poor interview question?

It is a hypothetical question, and people are unreliable predictors of their own future behavior. Developers may say yes because the scenario sounds appealing, but their actual behavior when faced with the tool may differ significantly. A better question asks about past or present behavior: "Tell me about the last time you used an AI coding assistant. What did you do with the suggestion it gave you?"

The following interview question contains a flaw. Identify it and rewrite the question: "Given that AI tools can generate boilerplate code automatically, how much time do you think you save using them?"

The question is leading: it presupposes that AI tools save time ("given that they can generate boilerplate code automatically") and asks the participant to quantify that saving. A respondent who does not save time, or who finds the tools slow them down, is implicitly pushed toward a positive answer. A better version is, "When you use AI coding tools, what effect do they have on how long tasks take you? Can you give me a specific recent example?"

Exercises

Write an Interview Guide (20 minutes)

Write a semi-structured interview guide (6-8 questions plus follow-up probes) for a study on how developers decide when to accept or reject AI code suggestions. Include at least one open question and one probe; identify one question from your first draft that was leading and explain how you revised it.

Code This Excerpt (15 minutes)

Apply open coding to the following interview excerpt. Identify at least four distinct codes, quote the specific text that led to each code, and then group your codes into two higher-level themes.

I use it mostly for stuff I already know how to do—like if I need to write a regex or remember the syntax for something in a library I don't use often. But for the core logic of whatever I'm building, I don't trust it. It'll give you something that looks right but misses an edge case, and you won't notice until production. I've started just not using it for anything security-related at all.

Evaluate a Survey (15 minutes)

Find the methods section of [Liang2024] or another published survey of developer experience with AI tools. Identify the sampling strategy, the response rate (if reported), and one specific design choice that reduces bias. Then identify one limitation the authors acknowledge and one they do not.