Conclusion

Goals

The Workflow You Built

Look back at the code and charts you produced over twelve sessions. What steps appear in every session?

When LLMs Fail

What do I do when the LLM produces code that crashes, or code that runs but gives the wrong answer?

The Environmental and Labor Costs

Given everything this course has covered, how should I decide when to use an LLM?

Where to Go Next

What should I learn after finishing this course?

Check Understanding

A colleague says "I don't need to check the LLM's output because it's always right for standard tasks." What is the flaw in this reasoning, and how would you explain it?

The LLM produces plausible text, not correct answers. For standard tasks it is right most of the time, which is what makes unchecked errors dangerous: they look like correct output. A unit error, a wrong column name, or a dropped null can all produce results that appear reasonable without a sanity check. The right approach is to check every result against something the LLM did not generate, such as a known row or a manual calculation.

You want to share your analysis with a collaborator so they can re-run it. List three things they will need, and one thing that can silently break even when all three are present.

They need: the data file, the notebook (with all cells run in order), and the same software versions (Polars, Altair, Python). Something that can silently break: if the data file is updated by its source (many government datasets are refreshed regularly), re-running the notebook will produce different results even with identical code. Record the date you downloaded the data and, if possible, save a copy alongside the notebook.

Explain the difference between statistical significance and practical significance, using an example from any session in this course.

Statistical significance means the result is unlikely to have arisen by chance given the sample size. Practical significance means the effect is large enough to matter in the real world. From session 11: a unit error can inflate numbers by a factor of nearly 300, making a trivial difference appear highly significant. A result can be statistically significant (p-value well below 0.05) and practically meaningless, or practically large and statistically insignificant if the sample is small. Both kinds of significance matter; neither alone is enough.

What is one question from your own field that you could now answer with the tools from this course, and one that you could not? Explain the difference.

Answers will vary. A question this course equips you to answer: "Do two groups in my dataset have different mean values, and is that difference larger than what chance would produce?" A question it does not equip you to answer: "Will this model correctly classify new cases it has never seen?" (That requires machine learning, not just descriptive statistics.) The difference is between describing what is in the data and predicting what is not yet in it.

Exercises

Write Your Own Methods Section

Take any analysis you produced in this course and write a methods section for it, as if you were submitting it to a journal. Include the data source, the cleaning steps, the software used, and the statistical methods applied.

Find a Retraction

Search for a published paper in your field that was retracted because of a data or statistical error. Identify which step in the workflow introduced the error. Describe what check from this course would have caught it.

Teach It Back

Pick one statistical concept from this course (mean vs. median, confidence intervals, correlation vs. causation) and write a one-page explanation aimed at a first-year undergraduate with no statistics background. Use an example that is not from this course.