There’s a lot of good advice on the web about how to do code reviews, from Cohen et al’s Best Kept Secrets of Peer Code Review to articles like these ones. What there isn’t a lot of is curated examples that are accessible to novices, particularly novices who think of themselves as scientists first and programmers second. The code in projects like scikit-learn or dplyr is much more complex than what most scientists write, so pull requests in those repositories can be pretty intimidating. Novices would probably find feedback on fifty-line scripts at the level of an introductory college course more helpful, but feedback on course assignments is usually private and usually focuses on the statistics rather than the code.
In order to put together a tutorial on code review for aspiring data scientists, I would be grateful for pointers to publicly-available examples of short programs in R or Python—say, 20 to 100 lines of code—that include the kinds of redundancies, workarounds, blind alleys, and outright mistakes that novices frequently make. RMarkdown files, Jupyter notebooks, or plain old scripts are all very welcome, and if you can share any feedback you have already received, that would be wonderful too.
I think a dozen examples of realistic code review will do more to help novices figure out what to look for and what to say about it than all the general principles in the world. I realize it’s uncomfortable to invite the whole world to watch you stumble, even anonymously, so everyone’s identity will be kept confidential. If you’re able to help, please get in touch.