Teaching Code Review

Posted

I have taught code review a few times to computer science students (but not yet to data scientists). The method I’ve used is:

  1. Give learners a half-page program and a short checklist (6-8 points) of things to look for.

  2. They have 5 minutes to go through the code and mark it up with numbers from the checklist (“this line violates rules 5 and 8”).

  3. We then go through it as a class so that they compare their evaluation with mine. This helps them understand how to interpret the points in the checklist.

  4. Repeat the exercise with another piece of code and the same checklist. Hopefully there will be closer agreement between their evaluation and mine now that they’ve seen how to interpret the checklist items.

  5. Give them a homework exercise with a larger piece of code (a full page, maybe longer) and a longer checklist (12-15 items).

Some of the items in the checklist can come straight from sources like Python’s PEP-8 style guide or the tidyverse style guide. Others should require higher-level thinking, e.g., “function should be split” or “functions should be combined”. For example, this short Python program is supposed to count stop words in a list of strings:

stops = ['a', 'A', 'the', 'The', 'and']

def count(ln):
    n = 0
    for i in range(len(ln)):
        line = ln[i]
        stuff = line.split()
        for word in stuff:
            # print(word)
            j = stops.count(word)
            if (j > 0) == True:
                n = n + 1
    return n

import sys

lines = sys.stdin.readlines()
# print('number of lines', len(lines))
n = count(lines)
print('number', n)

I expect learners to say that:

I would probably also point out that:

The last thing I do is show them what code looks like after problems have been fixed:

import sys


STOPS = {'a', 'A', 'the', 'The', 'and'}


def count(reader):
    n = 0
    for line in reader:
        for word in line.split():
            if word in STOPS:
                n += 1
    return n


if __name__ == '__main__':
    n = count(sys.stdin)
    print('number', n)

I haven’t (yet) had learners do code reviews with checklists on GitHub, but it seems like a natural next step.