Data Science for Software Engineers
This tutorial is a short introduction to data science for software engineers that uses software engineering questions and data to introduce common statistical tools and methods. Help is welcome, but please note that all contributors must abide by our Code of Conduct.
Day 1: Working with Real Data
- Why Should You Care What Researchers Found?
- Tidy Data and Polars Basics
- Grouping, Aggregating, and Joining
- Visualization with Altair
- Descriptive Statistics
- Lab: Python Coding Style at Scale
Day 2: Testing Claims
- The Logic of Hypothesis Testing
- Comparing Two Groups
- Effect Size and Practical Significance
- Correlation and Prediction
- Confounds, Bias, and Threats to Validity
- Lab: Does Test-Driven Development Work?
Day 3: Mining Repositories and the Impact of AI
- Mining Software Repositories
- Non-Parametric Methods and Rank-Based Tests
- Study Design and Replication
- AI Tools and Software Engineering
- Reading and Critiquing the Literature
- Capstone: Design Your Own Study
Day 4: Analyzing Qualitative Data
Appendices
start where you are · use what you have · help who you can