Data Science for Software Engineers

This tutorial is a short introduction to data science for software engineers that uses software engineering questions and data to introduce common statistical tools and methods. Help is welcome, but please note that all contributors must abide by our Code of Conduct.

Day 1: Working with Real Data

  1. Why Should You Care What Researchers Found?
  2. Tidy Data and Polars Basics
  3. Grouping, Aggregating, and Joining
  4. Visualization with Altair
  5. Descriptive Statistics
  6. Lab: Python Coding Style at Scale

Day 2: Testing Claims

  1. The Logic of Hypothesis Testing
  2. Comparing Two Groups
  3. Effect Size and Practical Significance
  4. Correlation and Prediction
  5. Confounds, Bias, and Threats to Validity
  6. Lab: Does Test-Driven Development Work?

Day 3: Mining Repositories and the Impact of AI

  1. Mining Software Repositories
  2. Non-Parametric Methods and Rank-Based Tests
  3. Study Design and Replication
  4. AI Tools and Software Engineering
  5. Reading and Critiquing the Literature
  6. Capstone: Design Your Own Study

Day 4: Analyzing Qualitative Data

  1. What Qualitative Data Looks Like in SE Research
  2. Thematic Analysis
  3. Coding Schemes and Inter-Rater Reliability
  4. Grounded Theory: A Practical Introduction
  5. Interviews and Survey Open-Ends
  6. Mixed Methods

Appendices

  1. License
  2. Code of Conduct
  3. Contributing
  4. Bibliography
  5. Glossary
  6. Datasets

start where you are · use what you have · help who you can